basic_ios::clear: iostream error when reading from file [duplicate] - c++

I've read that <fstream> predates <exception>. Ignoring the fact that exceptions on fstream aren't very informative, I have the following question:
It's possible to enable exceptions on file streams using the exceptions() method.
ifstream stream;
stream.exceptions(ifstream::failbit | ifstream::badbit);
stream.open(filename.c_str(), ios::binary);
Any attempt to open a nonexistent file, a file without the correct permissions, or any other I/O problem will results in exception. This is very good using an assertive programming style. The file was supposed to be there and be readable. If the conditions aren't met, we get an exception. If I wasn't sure whether the file could safely be opened, I could use other functions to test for it.
But now suppose I try to read into a buffer, like this:
char buffer[10];
stream.read(buffer, sizeof(buffer));
If the stream detects the end-of-file before filling the buffer, the stream decides to set the failbit, and an exception is fired if they were enabled. Why? What's the point of this? I could have verified that just testing eof() after the read:
char buffer[10];
stream.read(buffer, sizeof(buffer));
if (stream.eof()) // or stream.gcount() != sizeof(buffer)
// handle eof myself
This design choice prevents me from using standard exceptions on streams and forces me to create my own exception handling on permissions or I/O errors. Or am I missing something? Is there any way out? For example, can I easily test if I can read sizeof(buffer) bytes on the stream before doing so?

The failbit is designed to allow the stream to report that some operation failed to complete successfully. This includes errors such as failing to open the file, trying to read data that doesn't exist, and trying to read data of the wrong type.
The particular case you're asking about is reprinted here:
char buffer[10];
stream.read(buffer, sizeof(buffer));
Your question is why failbit is set when the end-of-file is reached before all of the input is read. The reason is that this means that the read operation failed - you asked to read 10 characters, but there weren't sufficiently many characters in the file. Consequently, the operation did not complete successfully, and the stream signals failbit to let you know this, even though the available characters will be read.
If you want to do a read operation where you want to read up to some number of characters, you can use the readsome member function:
char buffer[10];
streamsize numRead = stream.readsome(buffer, sizeof(buffer));
This function will read characters up to the end of the file, but unlike read it doesn't set failbit if the end of the file is reached before the characters are read. In other words, it says "try to read this many characters, but it's not an error if you can't. Just let me know how much you read." This contrasts with read, which says "I want precisely this many characters, and it's an error if you can't do it."
EDIT: An important detail I forgot to mention is that eofbit can be set without triggering failbit. For example, suppose that I have a text file that contains the text
137
without any newlines or trailing whitespace afterwards. If I write this code:
ifstream input("myfile.txt");
int value;
input >> value;
Then at this point input.eof() will return true, because when reading the characters from the file the stream hit the end of the file trying to see if there were any other characters in the stream. However, input.fail() will not return true, because the operation succeeded - we can indeed read an integer from the file.
Hope this helps!

Using the underlying buffer directly seems to do the trick:
char buffer[10];
streamsize num_read = stream.rdbuf()->sgetn(buffer, sizeof(buffer));

Improving #absence's answer, it follows a method readeof() that does the same of read() but doesn't set failbit on EOF. Also real read failures have been tested, like an interrupted transfer by hard removal of a USB stick or link drop in a network share access. It has been tested on Windows 7 with VS2010 and VS2013 and on linux with gcc 4.8.1. On linux only USB stick removal has been tried.
#include <iostream>
#include <fstream>
#include <stdexcept>
using namespace std;
streamsize readeof(istream &stream, char *buffer, streamsize count)
{
if (count == 0 || stream.eof())
return 0;
streamsize offset = 0;
streamsize reads;
do
{
// This consistently fails on gcc (linux) 4.8.1 with failbit set on read
// failure. This apparently never fails on VS2010 and VS2013 (Windows 7)
reads = stream.rdbuf()->sgetn(buffer + offset, count);
// This rarely sets failbit on VS2010 and VS2013 (Windows 7) on read
// failure of the previous sgetn()
(void)stream.rdstate();
// On gcc (linux) 4.8.1 and VS2010/VS2013 (Windows 7) this consistently
// sets eofbit when stream is EOF for the conseguences of sgetn(). It
// should also throw if exceptions are set, or return on the contrary,
// and previous rdstate() restored a failbit on Windows. On Windows most
// of the times it sets eofbit even on real read failure
(void)stream.peek();
if (stream.fail())
throw runtime_error("Stream I/O error while reading");
offset += reads;
count -= reads;
} while (count != 0 && !stream.eof());
return offset;
}
#define BIGGER_BUFFER_SIZE 200000000
int main(int argc, char* argv[])
{
ifstream stream;
stream.exceptions(ifstream::badbit | ifstream::failbit);
stream.open("<big file on usb stick>", ios::binary);
char *buffer = new char[BIGGER_BUFFER_SIZE];
streamsize reads = readeof(stream, buffer, BIGGER_BUFFER_SIZE);
if (stream.eof())
cout << "eof" << endl << flush;
delete buffer;
return 0;
}
Bottom line: on linux the behavior is more consistent and meaningful. With exceptions enabled on real read failures it will throw on sgetn(). On the contrary Windows will treat read failures as EOF most of the times.

Related

Symbol at the end of txt file appears [duplicate]

What is wrong with using feof() to control a read loop? For example:
#include <stdio.h>
#include <stdlib.h>
int
main(int argc, char **argv)
{
char *path = "stdin";
FILE *fp = argc > 1 ? fopen(path=argv[1], "r") : stdin;
if( fp == NULL ){
perror(path);
return EXIT_FAILURE;
}
while( !feof(fp) ){ /* THIS IS WRONG */
/* Read and process data from file… */
}
if( fclose(fp) != 0 ){
perror(path);
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
What is wrong with this loop?
TL;DR
while(!feof) is wrong because it tests for something that is irrelevant and fails to test for something that you need to know. The result is that you are erroneously executing code that assumes that it is accessing data that was read successfully, when in fact this never happened.
I'd like to provide an abstract, high-level perspective. So continue reading if you're interested in what while(!feof) actually does.
Concurrency and simultaneity
I/O operations interact with the environment. The environment is not part of your program, and not under your control. The environment truly exists "concurrently" with your program. As with all things concurrent, questions about the "current state" don't make sense: There is no concept of "simultaneity" across concurrent events. Many properties of state simply don't exist concurrently.
Let me make this more precise: Suppose you want to ask, "do you have more data". You could ask this of a concurrent container, or of your I/O system. But the answer is generally unactionable, and thus meaningless. So what if the container says "yes" – by the time you try reading, it may no longer have data. Similarly, if the answer is "no", by the time you try reading, data may have arrived. The conclusion is that there simply is no property like "I have data", since you cannot act meaningfully in response to any possible answer. (The situation is slightly better with buffered input, where you might conceivably get a "yes, I have data" that constitutes some kind of guarantee, but you would still have to be able to deal with the opposite case. And with output the situation is certainly just as bad as I described: you never know if that disk or that network buffer is full.)
So we conclude that it is impossible, and in fact unreasonable, to ask an I/O system whether it will be able to perform an I/O operation. The only possible way we can interact with it (just as with a concurrent container) is to attempt the operation and check whether it succeeded or failed. At that moment where you interact with the environment, then and only then can you know whether the interaction was actually possible, and at that point you must commit to performing the interaction. (This is a "synchronisation point", if you will.)
EOF
Now we get to EOF. EOF is the response you get from an attempted I/O operation. It means that you were trying to read or write something, but when doing so you failed to read or write any data, and instead the end of the input or output was encountered. This is true for essentially all the I/O APIs, whether it be the C standard library, C++ iostreams, or other libraries. As long as the I/O operations succeed, you simply cannot know whether further, future operations will succeed. You must always first try the operation and then respond to success or failure.
Examples
In each of the examples, note carefully that we first attempt the I/O operation and then consume the result if it is valid. Note further that we always must use the result of the I/O operation, though the result takes different shapes and forms in each example.
C stdio, read from a file:
for (;;) {
size_t n = fread(buf, 1, bufsize, infile);
consume(buf, n);
if (n == 0) { break; }
}
The result we must use is n, the number of elements that were read (which may be as little as zero).
C stdio, scanf:
for (int a, b, c; scanf("%d %d %d", &a, &b, &c) == 3; ) {
consume(a, b, c);
}
The result we must use is the return value of scanf, the number of elements converted.
C++, iostreams formatted extraction:
for (int n; std::cin >> n; ) {
consume(n);
}
The result we must use is std::cin itself, which can be evaluated in a boolean context and tells us whether the stream is still in the good() state.
C++, iostreams getline:
for (std::string line; std::getline(std::cin, line); ) {
consume(line);
}
The result we must use is again std::cin, just as before.
POSIX, write(2) to flush a buffer:
char const * p = buf;
ssize_t n = bufsize;
for (ssize_t k = bufsize; (k = write(fd, p, n)) > 0; p += k, n -= k) {}
if (n != 0) { /* error, failed to write complete buffer */ }
The result we use here is k, the number of bytes written. The point here is that we can only know how many bytes were written after the write operation.
POSIX getline()
char *buffer = NULL;
size_t bufsiz = 0;
ssize_t nbytes;
while ((nbytes = getline(&buffer, &bufsiz, fp)) != -1)
{
/* Use nbytes of data in buffer */
}
free(buffer);
The result we must use is nbytes, the number of bytes up to and including the newline (or EOF if the file did not end with a newline).
Note that the function explicitly returns -1 (and not EOF!) when an error occurs or it reaches EOF.
You may notice that we very rarely spell out the actual word "EOF". We usually detect the error condition in some other way that is more immediately interesting to us (e.g. failure to perform as much I/O as we had desired). In every example there is some API feature that could tell us explicitly that the EOF state has been encountered, but this is in fact not a terribly useful piece of information. It is much more of a detail than we often care about. What matters is whether the I/O succeeded, more-so than how it failed.
A final example that actually queries the EOF state: Suppose you have a string and want to test that it represents an integer in its entirety, with no extra bits at the end except whitespace. Using C++ iostreams, it goes like this:
std::string input = " 123 "; // example
std::istringstream iss(input);
int value;
if (iss >> value >> std::ws && iss.get() == EOF) {
consume(value);
} else {
// error, "input" is not parsable as an integer
}
We use two results here. The first is iss, the stream object itself, to check that the formatted extraction to value succeeded. But then, after also consuming whitespace, we perform another I/O/ operation, iss.get(), and expect it to fail as EOF, which is the case if the entire string has already been consumed by the formatted extraction.
In the C standard library you can achieve something similar with the strto*l functions by checking that the end pointer has reached the end of the input string.
It's wrong because (in the absence of a read error) it enters the loop one more time than the author expects. If there is a read error, the loop never terminates.
Consider the following code:
/* WARNING: demonstration of bad coding technique!! */
#include <stdio.h>
#include <stdlib.h>
FILE *Fopen(const char *path, const char *mode);
int main(int argc, char **argv)
{
FILE *in;
unsigned count;
in = argc > 1 ? Fopen(argv[1], "r") : stdin;
count = 0;
/* WARNING: this is a bug */
while( !feof(in) ) { /* This is WRONG! */
fgetc(in);
count++;
}
printf("Number of characters read: %u\n", count);
return EXIT_SUCCESS;
}
FILE * Fopen(const char *path, const char *mode)
{
FILE *f = fopen(path, mode);
if( f == NULL ) {
perror(path);
exit(EXIT_FAILURE);
}
return f;
}
This program will consistently print one greater than the number of characters in the input stream (assuming no read errors). Consider the case where the input stream is empty:
$ ./a.out < /dev/null
Number of characters read: 1
In this case, feof() is called before any data has been read, so it returns false. The loop is entered, fgetc() is called (and returns EOF), and count is incremented. Then feof() is called and returns true, causing the loop to abort.
This happens in all such cases. feof() does not return true until after a read on the stream encounters the end of file. The purpose of feof() is NOT to check if the next read will reach the end of file. The purpose of feof() is to determine the status of a previous read function
and distinguish between an error condition and the end of the data stream. If fread() returns 0, you must use feof/ferror to decide whether an error occurred or if all of the data was consumed. Similarly if fgetc returns EOF. feof() is only useful after fread has returned zero or fgetc has returned EOF. Before that happens, feof() will always return 0.
It is always necessary to check the return value of a read (either an fread(), or an fscanf(), or an fgetc()) before calling feof().
Even worse, consider the case where a read error occurs. In that case, fgetc() returns EOF, feof() returns false, and the loop never terminates. In all cases where while(!feof(p)) is used, there must be at least a check inside the loop for ferror(), or at the very least the while condition should be replaced with while(!feof(p) && !ferror(p)) or there is a very real possibility of an infinite loop, probably spewing all sorts of garbage as invalid data is being processed.
So, in summary, although I cannot state with certainty that there is never a situation in which it may be semantically correct to write "while(!feof(f))" (although there must be another check inside the loop with a break to avoid a infinite loop on a read error), it is the case that it is almost certainly always wrong. And even if a case ever arose where it would be correct, it is so idiomatically wrong that it would not be the right way to write the code. Anyone seeing that code should immediately hesitate and say, "that's a bug". And possibly slap the author (unless the author is your boss in which case discretion is advised.)
No it's not always wrong. If your loop condition is "while we haven't tried to read past end of file" then you use while (!feof(f)). This is however not a common loop condition - usually you want to test for something else (such as "can I read more"). while (!feof(f)) isn't wrong, it's just used wrong.
feof() indicates if one has tried to read past the end of file. That means it has little predictive effect: if it is true, you are sure that the next input operation will fail (you aren't sure the previous one failed BTW), but if it is false, you aren't sure the next input operation will succeed. More over, input operations may fail for other reasons than the end of file (a format error for formatted input, a pure IO failure -- disk failure, network timeout -- for all input kinds), so even if you could be predictive about the end of file (and anybody who has tried to implement Ada one, which is predictive, will tell you it can complex if you need to skip spaces, and that it has undesirable effects on interactive devices -- sometimes forcing the input of the next line before starting the handling of the previous one), you would have to be able to handle a failure.
So the correct idiom in C is to loop with the IO operation success as loop condition, and then test the cause of the failure. For instance:
while (fgets(line, sizeof(line), file)) {
/* note that fgets don't strip the terminating \n, checking its
presence allow to handle lines longer that sizeof(line), not showed here */
...
}
if (ferror(file)) {
/* IO failure */
} else if (feof(file)) {
/* format error (not possible with fgets, but would be with fscanf) or end of file */
} else {
/* format error (not possible with fgets, but would be with fscanf) */
}
feof() is not very intuitive. In my very humble opinion, the FILE's end-of-file state should be set to true if any read operation results in the end of file being reached. Instead, you have to manually check if the end of file has been reached after each read operation. For example, something like this will work if reading from a text file using fgetc():
#include <stdio.h>
int main(int argc, char *argv[])
{
FILE *in = fopen("testfile.txt", "r");
while(1) {
char c = fgetc(in);
if (feof(in)) break;
printf("%c", c);
}
fclose(in);
return 0;
}
It would be great if something like this would work instead:
#include <stdio.h>
int main(int argc, char *argv[])
{
FILE *in = fopen("testfile.txt", "r");
while(!feof(in)) {
printf("%c", fgetc(in));
}
fclose(in);
return 0;
}

fstream never reaches eof [duplicate]

What is wrong with using feof() to control a read loop? For example:
#include <stdio.h>
#include <stdlib.h>
int
main(int argc, char **argv)
{
char *path = "stdin";
FILE *fp = argc > 1 ? fopen(path=argv[1], "r") : stdin;
if( fp == NULL ){
perror(path);
return EXIT_FAILURE;
}
while( !feof(fp) ){ /* THIS IS WRONG */
/* Read and process data from file… */
}
if( fclose(fp) != 0 ){
perror(path);
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
What is wrong with this loop?
TL;DR
while(!feof) is wrong because it tests for something that is irrelevant and fails to test for something that you need to know. The result is that you are erroneously executing code that assumes that it is accessing data that was read successfully, when in fact this never happened.
I'd like to provide an abstract, high-level perspective. So continue reading if you're interested in what while(!feof) actually does.
Concurrency and simultaneity
I/O operations interact with the environment. The environment is not part of your program, and not under your control. The environment truly exists "concurrently" with your program. As with all things concurrent, questions about the "current state" don't make sense: There is no concept of "simultaneity" across concurrent events. Many properties of state simply don't exist concurrently.
Let me make this more precise: Suppose you want to ask, "do you have more data". You could ask this of a concurrent container, or of your I/O system. But the answer is generally unactionable, and thus meaningless. So what if the container says "yes" – by the time you try reading, it may no longer have data. Similarly, if the answer is "no", by the time you try reading, data may have arrived. The conclusion is that there simply is no property like "I have data", since you cannot act meaningfully in response to any possible answer. (The situation is slightly better with buffered input, where you might conceivably get a "yes, I have data" that constitutes some kind of guarantee, but you would still have to be able to deal with the opposite case. And with output the situation is certainly just as bad as I described: you never know if that disk or that network buffer is full.)
So we conclude that it is impossible, and in fact unreasonable, to ask an I/O system whether it will be able to perform an I/O operation. The only possible way we can interact with it (just as with a concurrent container) is to attempt the operation and check whether it succeeded or failed. At that moment where you interact with the environment, then and only then can you know whether the interaction was actually possible, and at that point you must commit to performing the interaction. (This is a "synchronisation point", if you will.)
EOF
Now we get to EOF. EOF is the response you get from an attempted I/O operation. It means that you were trying to read or write something, but when doing so you failed to read or write any data, and instead the end of the input or output was encountered. This is true for essentially all the I/O APIs, whether it be the C standard library, C++ iostreams, or other libraries. As long as the I/O operations succeed, you simply cannot know whether further, future operations will succeed. You must always first try the operation and then respond to success or failure.
Examples
In each of the examples, note carefully that we first attempt the I/O operation and then consume the result if it is valid. Note further that we always must use the result of the I/O operation, though the result takes different shapes and forms in each example.
C stdio, read from a file:
for (;;) {
size_t n = fread(buf, 1, bufsize, infile);
consume(buf, n);
if (n == 0) { break; }
}
The result we must use is n, the number of elements that were read (which may be as little as zero).
C stdio, scanf:
for (int a, b, c; scanf("%d %d %d", &a, &b, &c) == 3; ) {
consume(a, b, c);
}
The result we must use is the return value of scanf, the number of elements converted.
C++, iostreams formatted extraction:
for (int n; std::cin >> n; ) {
consume(n);
}
The result we must use is std::cin itself, which can be evaluated in a boolean context and tells us whether the stream is still in the good() state.
C++, iostreams getline:
for (std::string line; std::getline(std::cin, line); ) {
consume(line);
}
The result we must use is again std::cin, just as before.
POSIX, write(2) to flush a buffer:
char const * p = buf;
ssize_t n = bufsize;
for (ssize_t k = bufsize; (k = write(fd, p, n)) > 0; p += k, n -= k) {}
if (n != 0) { /* error, failed to write complete buffer */ }
The result we use here is k, the number of bytes written. The point here is that we can only know how many bytes were written after the write operation.
POSIX getline()
char *buffer = NULL;
size_t bufsiz = 0;
ssize_t nbytes;
while ((nbytes = getline(&buffer, &bufsiz, fp)) != -1)
{
/* Use nbytes of data in buffer */
}
free(buffer);
The result we must use is nbytes, the number of bytes up to and including the newline (or EOF if the file did not end with a newline).
Note that the function explicitly returns -1 (and not EOF!) when an error occurs or it reaches EOF.
You may notice that we very rarely spell out the actual word "EOF". We usually detect the error condition in some other way that is more immediately interesting to us (e.g. failure to perform as much I/O as we had desired). In every example there is some API feature that could tell us explicitly that the EOF state has been encountered, but this is in fact not a terribly useful piece of information. It is much more of a detail than we often care about. What matters is whether the I/O succeeded, more-so than how it failed.
A final example that actually queries the EOF state: Suppose you have a string and want to test that it represents an integer in its entirety, with no extra bits at the end except whitespace. Using C++ iostreams, it goes like this:
std::string input = " 123 "; // example
std::istringstream iss(input);
int value;
if (iss >> value >> std::ws && iss.get() == EOF) {
consume(value);
} else {
// error, "input" is not parsable as an integer
}
We use two results here. The first is iss, the stream object itself, to check that the formatted extraction to value succeeded. But then, after also consuming whitespace, we perform another I/O/ operation, iss.get(), and expect it to fail as EOF, which is the case if the entire string has already been consumed by the formatted extraction.
In the C standard library you can achieve something similar with the strto*l functions by checking that the end pointer has reached the end of the input string.
It's wrong because (in the absence of a read error) it enters the loop one more time than the author expects. If there is a read error, the loop never terminates.
Consider the following code:
/* WARNING: demonstration of bad coding technique!! */
#include <stdio.h>
#include <stdlib.h>
FILE *Fopen(const char *path, const char *mode);
int main(int argc, char **argv)
{
FILE *in;
unsigned count;
in = argc > 1 ? Fopen(argv[1], "r") : stdin;
count = 0;
/* WARNING: this is a bug */
while( !feof(in) ) { /* This is WRONG! */
fgetc(in);
count++;
}
printf("Number of characters read: %u\n", count);
return EXIT_SUCCESS;
}
FILE * Fopen(const char *path, const char *mode)
{
FILE *f = fopen(path, mode);
if( f == NULL ) {
perror(path);
exit(EXIT_FAILURE);
}
return f;
}
This program will consistently print one greater than the number of characters in the input stream (assuming no read errors). Consider the case where the input stream is empty:
$ ./a.out < /dev/null
Number of characters read: 1
In this case, feof() is called before any data has been read, so it returns false. The loop is entered, fgetc() is called (and returns EOF), and count is incremented. Then feof() is called and returns true, causing the loop to abort.
This happens in all such cases. feof() does not return true until after a read on the stream encounters the end of file. The purpose of feof() is NOT to check if the next read will reach the end of file. The purpose of feof() is to determine the status of a previous read function
and distinguish between an error condition and the end of the data stream. If fread() returns 0, you must use feof/ferror to decide whether an error occurred or if all of the data was consumed. Similarly if fgetc returns EOF. feof() is only useful after fread has returned zero or fgetc has returned EOF. Before that happens, feof() will always return 0.
It is always necessary to check the return value of a read (either an fread(), or an fscanf(), or an fgetc()) before calling feof().
Even worse, consider the case where a read error occurs. In that case, fgetc() returns EOF, feof() returns false, and the loop never terminates. In all cases where while(!feof(p)) is used, there must be at least a check inside the loop for ferror(), or at the very least the while condition should be replaced with while(!feof(p) && !ferror(p)) or there is a very real possibility of an infinite loop, probably spewing all sorts of garbage as invalid data is being processed.
So, in summary, although I cannot state with certainty that there is never a situation in which it may be semantically correct to write "while(!feof(f))" (although there must be another check inside the loop with a break to avoid a infinite loop on a read error), it is the case that it is almost certainly always wrong. And even if a case ever arose where it would be correct, it is so idiomatically wrong that it would not be the right way to write the code. Anyone seeing that code should immediately hesitate and say, "that's a bug". And possibly slap the author (unless the author is your boss in which case discretion is advised.)
No it's not always wrong. If your loop condition is "while we haven't tried to read past end of file" then you use while (!feof(f)). This is however not a common loop condition - usually you want to test for something else (such as "can I read more"). while (!feof(f)) isn't wrong, it's just used wrong.
feof() indicates if one has tried to read past the end of file. That means it has little predictive effect: if it is true, you are sure that the next input operation will fail (you aren't sure the previous one failed BTW), but if it is false, you aren't sure the next input operation will succeed. More over, input operations may fail for other reasons than the end of file (a format error for formatted input, a pure IO failure -- disk failure, network timeout -- for all input kinds), so even if you could be predictive about the end of file (and anybody who has tried to implement Ada one, which is predictive, will tell you it can complex if you need to skip spaces, and that it has undesirable effects on interactive devices -- sometimes forcing the input of the next line before starting the handling of the previous one), you would have to be able to handle a failure.
So the correct idiom in C is to loop with the IO operation success as loop condition, and then test the cause of the failure. For instance:
while (fgets(line, sizeof(line), file)) {
/* note that fgets don't strip the terminating \n, checking its
presence allow to handle lines longer that sizeof(line), not showed here */
...
}
if (ferror(file)) {
/* IO failure */
} else if (feof(file)) {
/* format error (not possible with fgets, but would be with fscanf) or end of file */
} else {
/* format error (not possible with fgets, but would be with fscanf) */
}
feof() is not very intuitive. In my very humble opinion, the FILE's end-of-file state should be set to true if any read operation results in the end of file being reached. Instead, you have to manually check if the end of file has been reached after each read operation. For example, something like this will work if reading from a text file using fgetc():
#include <stdio.h>
int main(int argc, char *argv[])
{
FILE *in = fopen("testfile.txt", "r");
while(1) {
char c = fgetc(in);
if (feof(in)) break;
printf("%c", c);
}
fclose(in);
return 0;
}
It would be great if something like this would work instead:
#include <stdio.h>
int main(int argc, char *argv[])
{
FILE *in = fopen("testfile.txt", "r");
while(!feof(in)) {
printf("%c", fgetc(in));
}
fclose(in);
return 0;
}

c++ is it safe to unget on fstreams

Suppose input.txt is 1 byte text file:
std::ifstream fin("input.txt", std::ios::in);
fin.get(); // 1st byte extracted
fin.get(); // try to extract 2nd byte
std::cout << fin.eof(); // eof is triggered
fin.unget(); // return back
std::cout << fin.eof(); // eof is now reset
fin.get(); // try to extract 2nd byte, eof assumed
std::cout << fin.eof(); // no eof is triggered
Seems like unget() breaks eof flag triggering also it breaks file pointers. Am I doing something wrong?
eof is not set, but neither is good. The stream is ignoring operations because it's in a failure mode.
I cannot recall what unget is supposed to do after EOF, but unget goes right back into failure if I use clear to allow a retry.
http://ideone.com/JkDrwG
It's usually better to use your own buffer. Putback is a hack.

Why does my program produce different results on Windows and Linux, about file reading with ifstream?

I have a program shown as follows. For it I have several questions:
1). Why does it produce different results on different platforms? I'll paste the screen-shots later.
2). I'm using a fail() method to check if the "file.read()" failed. Is this correct? I use fail() method because this web page says this:
The function returns true if either the failbit or the badbit is set. At least one of these flags is set when some error other than reaching the End-Of-File occurs during an input operation.
But later I read this page about istream::read() here. It says the eofbit and failbit would always be set at the same time.. Does this mean that a normal EOF situation would also result in that fail() returns true? This seems to conflict with "other than reaching the End-Of-File occurs"..
Could anyone help me clarify how I am supposed to use these methods? Should I use bad() instead?
My program
#include <iostream>
#include <fstream>
using namespace std;
#ifdef WIN32
char * path="C:\\Workspace\\test_file.txt";
#else
char * path="/home/robin/Desktop/temp/test_file.txt";
#endif
int main(int argc, char * argv[])
{
ifstream file;
file.open(path);
if (file.fail())
{
cout << "File open failed!" << endl;
return -1; // If the file open fails, quit!
}
// Calculate the total length of the file so I can allocate a buffer
file.seekg(0, std::ios::end);
size_t fileLen = file.tellg();
cout << "File length: " << fileLen << endl;
file.seekg(0, std::ios::beg);
// Now allocate the buffer
char * fileBuf = new (std::nothrow) char[fileLen+1];
if (NULL == fileBuf)
return -1;
::memset((void *)fileBuf, 0, fileLen+1); // Zero the buffer
// Read the file into the buffer
file.read(fileBuf, fileLen);
cout << "eof: " << file.eof() << endl
<< "fail: " << file.fail() << endl
<< "bad: " << file.bad() << endl;
if (file.fail())
{
cout << "File read failed!" << endl;
delete [] fileBuf;
return -1;
}
// Close the file
file.close();
// Release the buffer
delete [] fileBuf;
return 0;
}
The test_file.txt content(shown with "vim -b". It's a very simple file):
Result on Windows(Visual Studio 2008 SP1):
Result on Linux(gcc 4.1.2):
Does this mean that a normal EOF situation would also result in that fail() returns true? This seems to conflict with "other than reaching the End-Of-File occurs".
I recommend using a reference that isn't full of mistakes.
http://en.cppreference.com/w/cpp/io/basic_ios/fail says:
Returns true if an error has occurred on the associated stream. Specifically, returns true if badbit or failbit is set in rdstate().
And the C++ standard says:
Returns: true if failbit or badbit is set in rdstate().
There's no "other than end-of-file" thing. An operation that tries to read past the end of the file, will cause failbit to set as well. The eofbit only serves to distinguish that specific failure reason from others (and that is not as useful as one might think at first).
I'm using a fail() method to check if the "file.read()" failed. Is this correct?
You should simply test with conversion to bool.
if(file) { // file is not in an error state
It's synonymous with !fail(), but it's more usable, because you can use it to test directly the result of a read operation without extra parenthesis (things like !(stream >> x).fail() get awkward):
if(file.read(fileBuf, fileLen)) { // read succeeded
You will notice that all read operations on streams return the stream itself, which is what allows you to do this.
Why does it produce different results on different platforms?
The difference you're seeing between Windows and Linux is because the file is open in text mode: newline characters will be converted silently by the implementation. This means that the combination "\r\n" (used in Windows for newlines) will be converted to a single '\n' character in Windows, making the file have only 8 characters. Note how vim shows a ^M at the end of the first line: that's the '\r' part. In Linux a newline is just '\n'.
You should open the file in binary mode if you want to preserve the original as is:
file.open(path, std::ios_base::in | std::ios_base::binary);
I guess, the problem here with the different execution is the DOS(Window) vs. UNIX text file convention.
In DOS, a line ends with <CR><LF>, and this is read/written together as '\n'. Thus, in Windows your file is at the end, but in UNIX not, since there is one character left.

Why is failbit set when eof is found on read?

I've read that <fstream> predates <exception>. Ignoring the fact that exceptions on fstream aren't very informative, I have the following question:
It's possible to enable exceptions on file streams using the exceptions() method.
ifstream stream;
stream.exceptions(ifstream::failbit | ifstream::badbit);
stream.open(filename.c_str(), ios::binary);
Any attempt to open a nonexistent file, a file without the correct permissions, or any other I/O problem will results in exception. This is very good using an assertive programming style. The file was supposed to be there and be readable. If the conditions aren't met, we get an exception. If I wasn't sure whether the file could safely be opened, I could use other functions to test for it.
But now suppose I try to read into a buffer, like this:
char buffer[10];
stream.read(buffer, sizeof(buffer));
If the stream detects the end-of-file before filling the buffer, the stream decides to set the failbit, and an exception is fired if they were enabled. Why? What's the point of this? I could have verified that just testing eof() after the read:
char buffer[10];
stream.read(buffer, sizeof(buffer));
if (stream.eof()) // or stream.gcount() != sizeof(buffer)
// handle eof myself
This design choice prevents me from using standard exceptions on streams and forces me to create my own exception handling on permissions or I/O errors. Or am I missing something? Is there any way out? For example, can I easily test if I can read sizeof(buffer) bytes on the stream before doing so?
The failbit is designed to allow the stream to report that some operation failed to complete successfully. This includes errors such as failing to open the file, trying to read data that doesn't exist, and trying to read data of the wrong type.
The particular case you're asking about is reprinted here:
char buffer[10];
stream.read(buffer, sizeof(buffer));
Your question is why failbit is set when the end-of-file is reached before all of the input is read. The reason is that this means that the read operation failed - you asked to read 10 characters, but there weren't sufficiently many characters in the file. Consequently, the operation did not complete successfully, and the stream signals failbit to let you know this, even though the available characters will be read.
If you want to do a read operation where you want to read up to some number of characters, you can use the readsome member function:
char buffer[10];
streamsize numRead = stream.readsome(buffer, sizeof(buffer));
This function will read characters up to the end of the file, but unlike read it doesn't set failbit if the end of the file is reached before the characters are read. In other words, it says "try to read this many characters, but it's not an error if you can't. Just let me know how much you read." This contrasts with read, which says "I want precisely this many characters, and it's an error if you can't do it."
EDIT: An important detail I forgot to mention is that eofbit can be set without triggering failbit. For example, suppose that I have a text file that contains the text
137
without any newlines or trailing whitespace afterwards. If I write this code:
ifstream input("myfile.txt");
int value;
input >> value;
Then at this point input.eof() will return true, because when reading the characters from the file the stream hit the end of the file trying to see if there were any other characters in the stream. However, input.fail() will not return true, because the operation succeeded - we can indeed read an integer from the file.
Hope this helps!
Using the underlying buffer directly seems to do the trick:
char buffer[10];
streamsize num_read = stream.rdbuf()->sgetn(buffer, sizeof(buffer));
Improving #absence's answer, it follows a method readeof() that does the same of read() but doesn't set failbit on EOF. Also real read failures have been tested, like an interrupted transfer by hard removal of a USB stick or link drop in a network share access. It has been tested on Windows 7 with VS2010 and VS2013 and on linux with gcc 4.8.1. On linux only USB stick removal has been tried.
#include <iostream>
#include <fstream>
#include <stdexcept>
using namespace std;
streamsize readeof(istream &stream, char *buffer, streamsize count)
{
if (count == 0 || stream.eof())
return 0;
streamsize offset = 0;
streamsize reads;
do
{
// This consistently fails on gcc (linux) 4.8.1 with failbit set on read
// failure. This apparently never fails on VS2010 and VS2013 (Windows 7)
reads = stream.rdbuf()->sgetn(buffer + offset, count);
// This rarely sets failbit on VS2010 and VS2013 (Windows 7) on read
// failure of the previous sgetn()
(void)stream.rdstate();
// On gcc (linux) 4.8.1 and VS2010/VS2013 (Windows 7) this consistently
// sets eofbit when stream is EOF for the conseguences of sgetn(). It
// should also throw if exceptions are set, or return on the contrary,
// and previous rdstate() restored a failbit on Windows. On Windows most
// of the times it sets eofbit even on real read failure
(void)stream.peek();
if (stream.fail())
throw runtime_error("Stream I/O error while reading");
offset += reads;
count -= reads;
} while (count != 0 && !stream.eof());
return offset;
}
#define BIGGER_BUFFER_SIZE 200000000
int main(int argc, char* argv[])
{
ifstream stream;
stream.exceptions(ifstream::badbit | ifstream::failbit);
stream.open("<big file on usb stick>", ios::binary);
char *buffer = new char[BIGGER_BUFFER_SIZE];
streamsize reads = readeof(stream, buffer, BIGGER_BUFFER_SIZE);
if (stream.eof())
cout << "eof" << endl << flush;
delete buffer;
return 0;
}
Bottom line: on linux the behavior is more consistent and meaningful. With exceptions enabled on real read failures it will throw on sgetn(). On the contrary Windows will treat read failures as EOF most of the times.