file reading: feof() for binary files - c++

I am reading a binary file. and when it reaches the end. it seems it is terminated by feof() function. is it because there is no EOF character for binary files? if so how can i solve it.
currently my code is using a while loop
while (!feof(f))
when it reaches the end of file at position 5526900. it doesn't stop. it just keeps trying to read, and i am stuck at the loop.
can anyone tell me why and how to solve it.
Thanks

You should not use feof() to loop on - instead, use the return value of fread() - loop until it returns zero. This is easy to see if you consider reading an empty file - feof() returns the EOF status AFTER a read operation, so it will always try to read bogus data if used as a loop control.
I don't know why so many people think feof() (and the eof() member of C++ streams) can predict if the next read operation will succeed, but believe me, they can't.

Related

C++: end-of-file interpretation when using std::cin as a condition

I know that we can use std::cin as a condition, for example, in
while (std::cin >> value)
using std::cin as a condition will call a member function std::ios::operator bool. It says that
it "returns whether an error flag is set (either failbit or badbit)", which does not include
eofbit. Despite this, passing end-of-file (by Ctrl+d) terminates the loop. Why? Can failbit or badbit also set an eofbit?
I also found this explanation, but in C++ Reference it specifically says that "this function does not return the same as member good"
The loop above does not test for end of file. It tests for failure to read a value, end of file is just one possible cause of this. Even end of file does not necessarily cause a failure to read a value, imagine reading an integer where the digits are terminated by the end of file, you still read an integer even though you hit the end of file.
The bottom line is that failure to read a value for any reason sets the fail bit, and this loop tests for that.

Can "eof" be set in ofstream?

I could not find pretty much any information on this. Whether and if so, under what circumstances, can eofbit be set (meaning ofstream_instance.eof() is true )?
I am more interested in an independent ofstream, one that is not associated with an ifstream within some fstream, so that the "shared" eofbit can't be set by the ifstream (if something like that is possible).
If I simply write to a file and there is no space on disk or operation system does not provide another space for the writing, then I'd expect just only either failbit or badbit to be set, but reaching end of file while writing to it does not make sense to me. However no discussion on this I was able to find.
No. eof() returns the eofbit, which has no real meaning for an output stream with no associated input stream.
eofbit indicates that an input operation reached the end of an input sequence
[ios.types] / 3.1, Table 107
The actions that set eofbit are enumerated here, and they all act on input streams only.
We could imagine some weird implementation-specific scenario in which EOF (as opposed to some other error condition) would be hit while writing to a file - maybe there is a fixed-size file buffer we are writing to through some OS functions - but as far as I know the standard library abstractions do not deal with that case, and I have never seen or heard of such an API in the first place.

C++ istream::peek - shouldn't it be nonblocking?

It seems well accepted that the istream::peek operation is blocking.
The standard, though arguably a bit ambiguous, leans towards nonblocking behavior. peek calls sgetc in turn, whose behavior is:
"The character at the current position of the controlled input sequence, as a value of type int.
If there are no more characters to read from the controlled input sequence, the function returns the end-of-file value (EOF)."
It doesn't say "If there are no more characters.......wait until there are"
Am I missing something here? Or are the peek implementations we use just kinda wrong?
The controlled input sequence is the file (or whatever) from which you're reading. So if you're at end of file, it returns EOF. Otherwise it returns the next character from the file.
I see nothing here that's ambiguous at all--if it needs a character that hasn't been read from the file, then it needs to read it (and wait till it's read, and return it).
If you're reading from something like a socket, then it's going to wait until data arrives (or the network stack detects EOF, such as the peer disconnecting).
The description from cppreference.com might be clearer than the one in your question:
Ensures that at least one character is available in the input area by [...] reading more data in from the input sequence (if applicable)."
"if applicable" does apply in this case; and "reading data from the input sequence" entails waiting for more data if there is none and the stream is not in an EOF or other error state.
When I get confused about console input I remind myself that console input can be redirected to come from a file, so the behavior of the keyboard more or less mimics the behavior of a file. When you try to read a character from file, you can get one of two results: you get a character, or you get EOF because you've reached the end of the file -- there are no more characters to be read. Same thing for keyboard input: either you get a character, or you get EOF because you've reached the end of the file. With a file, there is no notion of waiting for more characters: either a file has unread characters or it doesn't. Same thing for the keyboard. So if you have't reached EOF on the keyboard, reading a character returns the next character. You reach EOF on the keyboard by typing whatever character your system recognizes as EOF; on Unix systems that's ctrl-D, on Windows (if I remember correctly) that's ctrl-C. If you haven't reached EOF, there are more characters to be read.

How does feof() actually know when the end of file is reached?

I'm a beginner in C++ and trying to better understand feof(). I've read that feof() flag is set to true only after trying to read past the end of a file so many times beginners will read once more than they were expecting if they do something like while(!feof(file)). What I'm trying to understand though, is how does it actually interpret that an attempt has been made to read past the end of the file? Is the entire file already read in and the number of characters already known or is there some other mechanism at work?
I realize this may be a duplicate question somewhere, but I've been unable to find it, probably because I don't know the best way to word what I'm asking. If there is an answer already out there a link would be much appreciated. Thanks.
Whatever else the C++ library does, eventually it has to read from the file. Somewhere in the operating system, there is a piece of code that eventually handles that read. It obtains from the filesystem the length of the file, stored the same way the filesystem stores everything else. Knowing the length of the file, the position of the read, and the number of bytes to be read, it can make the determination that the low-level read hits the end of the file.
When that determination is made, it is passed up the stack. Eventually, it gets to the standard library which records internally that the end of file has been reached. When a read request into the library tries to go past that recorded end, the EOF flag is set and feof will start returning true.
feof() is a part of the standard C library buffered I/O. Since it's buffered, fread() pre-reads some data (definitely not the whole file, though). If, while buffering, fread() detects EOF (the underlying OS routine returns a special value, usually -1), it sets a flag on the FILE structure. feof() simply checks for that flag. So feof() returning true essentially means “a previous read attempt encountered end of file”.
How EOF is detected is OS/FS-specific and has nothing to do whatsoever with the C library/language. The OS has some interface to read data from files. The C library is just a bridge between the OS and the program, so you don't have to change your program if you move to another OS. The OS knows how the files are stored in its filesystem, so it knows how to detect EOF. My guess is that typically it is performed by comparing the current position to the length of the file, but it may be not that easy and may involve a lot of low-level details (for example, what if the file is on a network drive?).
An interesting question is what happens when the stream is at the end, but it was not yet detected by any reads. For example, if you open an empty file. Does the first call to feof() before any fread() return true or false? The answer is probably false. The docs aren't terribly clear on this subject:
This indicator is generally set by a previous operation on the stream
that attempted to read at or past the end-of-file.
It sounds as if a particular implementation may choose some other unusual ways to set this flag.
Most file system maintain meta information about the file (including it's size), and an attempt to read past the end of results in the feof flag being set. Others, for instance, old or lightweight file systems, set feof when they come to the last byte of the last block in the chain.
How does feof() actually know when the end of file is reached?
When code attempts to read passed the last character.
Depending on the file type, the last character is not necessarily known until a attempt to read past it occurs and no character is available.
Sample code demonstrating feof() going from 0 to 1
#include <stdio.h>
void ftest(int n) {
FILE *ostream = fopen("tmp.txt", "w");
if (ostream) {
while (n--) {
fputc('x', ostream);
}
fclose(ostream);
}
FILE *istream = fopen("tmp.txt", "r");
if (istream) {
char buf[10];
printf("feof() %d\n", feof(istream));
printf("fread %zu\n", fread(buf, 1, 10, istream));
printf("feof() %d\n", feof(istream));
printf("fread %zu\n", fread(buf, 1, 10, istream));
printf("feof() %d\n", feof(istream));
puts("");
fclose(istream);
}
}
int main(void) {
ftest(9);
ftest(10);
return 0;
}
Output
feof() 0
fread 9 // 10 character read attempted, 9 were read
feof() 1 // eof is set as previous read attempted to read passed the 9th or last char
fread 0
feof() 1
feof() 0
fread 10 // 10 character read attempted, 10 were read
feof() 0 // eof is still clear as no attempt to read passed the 10th, last char
fread 0
feof() 1
The feof() function sets the end of file indicator when the EOF character is read. So when feof() reads the last item, the EOF is not read along with it at first. Since no EOF indicator is set and feof() returns zero, the flow enters the while loop again. This time fgets comes to know that the next character is EOF, its discards it and returns NULL but also sets the EOF indicator. So feof() detects the end of file indicator and returns a non-zero value therefore breaking the while loop.

C++ istream: Is gcount() always set after a read() even if it fails?

I am reading some data using an istream and read(). I would like to know if I can just test gcount() for the bytes or if I need to test some combination of good(), eof(), etc before calling gcount(). In other words, is gcount() always set after a read() even if that read failed due to EOF or some other internal problem?
Also if this is described in the standard or somewhere that you can cite. I'm using cplusplus.com as a reference and it says that gcount "Returns the number of characters extracted by the last unformatted input operation performed on the object." Can I interpret statements like "last operation" to mean last operation, whatever the outcome?
Is gcount() always set after a read() even if that read failed due to EOF or some other internal problem?
Yes
gcount()'s job is solely to the return the number of characters extracted from the last unformatted input operation. The Standard makes no distinction between the value of gcount() when an extraction succeeds and when it fails. And obviously if the input operation could not extract characters then the value will be 0.
So all you need to test if an extraction succeeded is by using it as the condition. Use only gcount() in the condition only if you wish to determine if a certain amount of characters were extracted.