Why is ::feof() different from ::_eof(::fileno())? - c++

DISCLAIMER: Don't use ::feof() as your loop condition. For example, see the answer to: file reading: feof() for binary files
However, I have "real" code that demonstrates a problem that does not use ::feof() as my loop condition, but LOGICALLY, this is the easiest way to demonstrate the problem.
Consider the following: We iterate a character stream one-at-a-time:
FILE* my_file;
// ...open "my_file" for reading...
int c;
while(0 == ::feof(my_file))
{ // We are not at EOF
c = ::getc(my_file);
// ...process "c"
}
The above code works as expected: The file is processed one-char-at-a-time, and upon EOF, we drop out.
HOWEVER, the following has unexpected behavior:
FILE* my_file;
// ...open "my_file" for reading...
int c;
while(0 == ::_eof(::fileno(my_file)))
{ // We are not at EOF
c = ::getc(my_file);
// ...process "c"
}
I would have expected them to perform the same. ::fileno() properly returns the (integer) file descriptor every time. However, the test ::_eof(::fileno(my_file)) works exactly once, and then returns 1 (indicating an EOF) on the second attempt.
I do not understand this.
I suppose it is conceivable that ::feof() is "buffered" (so it works correctly) while ::_eof() is "un-buffered" and thinks the whole file is "read-in" already (because the whole file would have fit into the first block read in from disk). However, that can't possibly be true given the purpose of those functions. So, I'm really at a loss.
What's going on?
(Files are opened as "text", are ASCII text files with about a dozen lines, MSVS2008, Win7/64.)

I suppose it is conceivable that ::feof() is "buffered" (so it works
correctly) while ::_eof() is "un-buffered" and thinks the whole file
is "read-in" already (because the whole file would have fit into the
first block read in from disk). However, that can't possibly be true
given the purpose of those functions. So, I'm really at a loss.
I don't know why you would think it "can't possibly be true given the purpose of those functions." The 2 functions are meant to operate on files that are opened and operated on in different ways, so they are not compatible.
In fact, that is exactly what is happening. Try this:
FILE* my_file;
// ...open "my_file" for reading...
int c;
while(0 == ::_eof(::fileno(my_file)))
{ // We are not at EOF
c = ::getc(my_file);
long offset1 = ftell(my_file);
long offset2 = _tell(fileno(my_file));
if (offset1 != offset2)
{
//here you will see that the file pointers are different
//which means that _eof and feof will fire true under different conditions
}
// ...process "c"
}
I will try to elaborate a bit based on your comment.
When you call fopen, you are getting back a pointer to a file stream. The underlying stream object keeps it's own file pointer which is separate from the actual file pointer associated with the underlying file descriptor.
When you call _eof you are asking if you have reached the end of the actual file. When you call feof, you are asking if you have reached the end of the file stream. Since file streams are usually buffered, the end of the file is reached before the end of the stream.
I'm still trying to understand your answer below, and what the purpose
is for _eof() if it always returns 1 even when you didn't read
anything (after the first char).
To answer this question, the purpose of _eof is to determine if you have reached the end of the file when using _open and _read to work directly with file descriptors, not when you use fopen and fread or getc to work with file streams.

Related

Reading and writing the same file simultaneosly with c++

I'm trying to read and write a file as I loop through its lines. At each line, I will do an evaluation to determine if I want to write it into the file or skip it and move onto the next line. This is a basically a skeleton of what I have so far.
void readFile(char* fileName)
{
char line[1024];
fstream file("test.file", ios::in | ios::out);
if(file.is_open())
{
while(file.getline(line,MAX_BUFFER))
{
//evaluation
file.seekg(file.tellp());
file << line;
file.seekp(file.tellg());
}
}
}
As I'm reading in the lines, I seem to be having issues with the starting index of the string copied into the line variable. For example, I may be expecting the string in the line variable to be "000/123/FH/" but it actually goes in as "123/FH/". I suspect that I have an issue with file.seekg(file.tellp()) and file.seekp(file.tellg()) but I am not sure what it is.
It is not clear from your code [1] and problem description what is in the file and why you expect "000/123/FH/", but I can state that the getline function is a buffered input, and you don't have code to access the buffer. In general, it is not recommended to use buffered and unbuffered i/o together because it requires deep knowledge of the buffer mechanism and then relies on that mechanism not to change as libraries are upgraded.
You appear to want to do byte or character[2] level manipulation. For small files, you should read the entire file into memory, manipulate it, and then overwrite the original, requiring an open, read, close, open, write, close sequence. For large files you will need to use fread and/or some of the other lower level C library functions.
The best way to do this, since you are using C++, is to create your own class that handles reading up to and including a line separator [3] into one of the off-the-shelf circular buffers (that use malloc or a plug-in allocator as in the case of STL-like containers) or a circular buffer you develop as a template over a statically allocated array of bytes (if you want high speed an low resource utilization). The size will need to be at least as large as the longest line in the later case. [4]
Either way, you would want to add to the class to open the file in binary mode and expose the desired methods to do the line level manipulations to an arbitrary line. Some say (and I personally agree) that taking advantage of Bjarne Stroustrup's class encapsulation in C++ is that classes are easier to test carefully. Such a line manipulation class would encapsulate the random access C functions and unbuffered i/o and leave open the opportunity to maximize speed, while allowing for plug-and-play usage in systems and applications.
Notes
[1] The seeking of the current position is just testing the functions and does not yet, in the current state of the code, re-position the current file pointer.
[2] Note that there is a difference between character and byte level manipulations in today's computing environment where utf-8 or some other unicode standard is now more common than ASCII in many domains, especially that of the web.
[3] Note that line separators are dependent on the operating system, its version, and sometimes settings.
[4] The advantage of circular buffers in terms of speed is that you can read more than one line using fread at a time and use fast iteration to find the next end of line.
Taking inspiration from Douglas Daseeco's response, I resolved my issue by simply reading the existing file, writing its lines into a new file, then renaming the new file to overwrite the original file. Below is a skeleton of my solution.
char line[1024];
ifstream inFile("test.file");
ofstream outFile("testOut.file");
if(inFile.is_open() && outFile.is_open())
{
while(inFile.getline(line,1024))
{
// do some evaluation
if(keep)
{
outFile << line;
outFile << "\n";
}
}
inFile.close();
outFile.close();
rename("testOut.file","test.file");
}
You are reading and writing to the same file you might end up of having duplicate lines in the file.
You could find this very useful. Imagine your 1st time of reaching the while loop and starting from the beginning of the file you do file.getline(line, MAX_BUFFER). Now the get pointer (for reading) moves MAX_BUFFER places from the beginning of the file (your starting point).
After you've determine to write back to the file seekp() helps to specify with respect to a reference point the location you want to write to, syntax: file.seekp(num_bytes,"ref"); where ref will be ios::beg(beginning), ios::end, ios::cur (current position in file).
As in your code after reading, find a way to use MAX_BUFFER to refer to a location with respect to a reference.
while(file.good())
{
file.getline(line,MAX_BUFFER);
...
if(//for some reasone you want to write back)
{
// set put-pointer to location for writing
file.seekp(num_bytes, "ref");
file << line;
}
//set get-pointer to desired location for the next read
file.seekg(num_bytes, "ref");
}

How does feof() actually know when the end of file is reached?

I'm a beginner in C++ and trying to better understand feof(). I've read that feof() flag is set to true only after trying to read past the end of a file so many times beginners will read once more than they were expecting if they do something like while(!feof(file)). What I'm trying to understand though, is how does it actually interpret that an attempt has been made to read past the end of the file? Is the entire file already read in and the number of characters already known or is there some other mechanism at work?
I realize this may be a duplicate question somewhere, but I've been unable to find it, probably because I don't know the best way to word what I'm asking. If there is an answer already out there a link would be much appreciated. Thanks.
Whatever else the C++ library does, eventually it has to read from the file. Somewhere in the operating system, there is a piece of code that eventually handles that read. It obtains from the filesystem the length of the file, stored the same way the filesystem stores everything else. Knowing the length of the file, the position of the read, and the number of bytes to be read, it can make the determination that the low-level read hits the end of the file.
When that determination is made, it is passed up the stack. Eventually, it gets to the standard library which records internally that the end of file has been reached. When a read request into the library tries to go past that recorded end, the EOF flag is set and feof will start returning true.
feof() is a part of the standard C library buffered I/O. Since it's buffered, fread() pre-reads some data (definitely not the whole file, though). If, while buffering, fread() detects EOF (the underlying OS routine returns a special value, usually -1), it sets a flag on the FILE structure. feof() simply checks for that flag. So feof() returning true essentially means “a previous read attempt encountered end of file”.
How EOF is detected is OS/FS-specific and has nothing to do whatsoever with the C library/language. The OS has some interface to read data from files. The C library is just a bridge between the OS and the program, so you don't have to change your program if you move to another OS. The OS knows how the files are stored in its filesystem, so it knows how to detect EOF. My guess is that typically it is performed by comparing the current position to the length of the file, but it may be not that easy and may involve a lot of low-level details (for example, what if the file is on a network drive?).
An interesting question is what happens when the stream is at the end, but it was not yet detected by any reads. For example, if you open an empty file. Does the first call to feof() before any fread() return true or false? The answer is probably false. The docs aren't terribly clear on this subject:
This indicator is generally set by a previous operation on the stream
that attempted to read at or past the end-of-file.
It sounds as if a particular implementation may choose some other unusual ways to set this flag.
Most file system maintain meta information about the file (including it's size), and an attempt to read past the end of results in the feof flag being set. Others, for instance, old or lightweight file systems, set feof when they come to the last byte of the last block in the chain.
How does feof() actually know when the end of file is reached?
When code attempts to read passed the last character.
Depending on the file type, the last character is not necessarily known until a attempt to read past it occurs and no character is available.
Sample code demonstrating feof() going from 0 to 1
#include <stdio.h>
void ftest(int n) {
FILE *ostream = fopen("tmp.txt", "w");
if (ostream) {
while (n--) {
fputc('x', ostream);
}
fclose(ostream);
}
FILE *istream = fopen("tmp.txt", "r");
if (istream) {
char buf[10];
printf("feof() %d\n", feof(istream));
printf("fread %zu\n", fread(buf, 1, 10, istream));
printf("feof() %d\n", feof(istream));
printf("fread %zu\n", fread(buf, 1, 10, istream));
printf("feof() %d\n", feof(istream));
puts("");
fclose(istream);
}
}
int main(void) {
ftest(9);
ftest(10);
return 0;
}
Output
feof() 0
fread 9 // 10 character read attempted, 9 were read
feof() 1 // eof is set as previous read attempted to read passed the 9th or last char
fread 0
feof() 1
feof() 0
fread 10 // 10 character read attempted, 10 were read
feof() 0 // eof is still clear as no attempt to read passed the 10th, last char
fread 0
feof() 1
The feof() function sets the end of file indicator when the EOF character is read. So when feof() reads the last item, the EOF is not read along with it at first. Since no EOF indicator is set and feof() returns zero, the flow enters the while loop again. This time fgets comes to know that the next character is EOF, its discards it and returns NULL but also sets the EOF indicator. So feof() detects the end of file indicator and returns a non-zero value therefore breaking the while loop.

How read file functions recognize end of a text file in C++?

As far as you know, there are two standard to read a text file in C++ (in this case 2 numbers in every line) :
The two standard methods are:
Assume that every line consists of 2 numbers and read token by token:
#include <fstream>
std::ifstream infile("thefile.txt");
int a, b;
while (infile >> a >> b)
{
// process pair (a,b)
}
Line-based parsing, using string streams:
#include <sstream>
#include <string>
#include <fstream>
std::ifstream infile("thefile.txt");
std::string line;
while (std::getline(infile, line))
{
std::istringstream iss(line);
int a, b;
if (!(iss >> a >> b)) { break; } // error
// process pair (a,b)
}
And also I can use the below code to see if the files ends or not :
while (!infile.eof())
My question is :
Question1: how this functions understand that one line is the last
line? I mean "how eof() returns false\true?"
As far as I know, they reading a part of memory. what is the
difference between the part that belongs to the file and the parts
that not?
Question2: Is there anyway to cheat this function?! I mean, Is it
possible to add something in the middle of the text file (for example
by a Hex editor tools) and make the eof() wrongly returns True in
the middle of the text file?
Appreciate your time and consideration.
Question1: how this functions understand that one line is the last line? I mean "how eof() returns false\true?"
It doesn't. The functions know when you've tried to read past the very last character in the file. They don't necessarily know whether a line is the last line. "Files" aren't the only things that you can read with streams. Keyboard input, a special purpose device, internet sockets: All can be read with the right kind of I/O stream. When reading from standard input, the stream has no knowing of if the very next thing I type is control-Z.
With regard to files on a computer disk, most modern operating systems store metadata regarding the file separate from the file. These metadata include the length of the file (and oftentimes when the file was last modified and when it was last read). On these systems, the stream buffer than underlies the I/O stream knows the current read location within the file and knows how long the file is. The stream buffer signals EOF when the read location reaches the length of the file.
That's not universal, however. There are some not-so-common operating systems that don't use this concept of metadata stored elsewhere. End of file on a disk file is just as surprising on these systems as is end of file from user input on a keyboard.
As far as I know, they reading a part of memory. what is the difference between the part that belongs to the file and the parts that not?
Learn the difference between memory and disk files. There's a huge difference between the two. Unless you're working with an embedded computer, memory is much more limited than is disk space.
Question2: Is there anyway to cheat this function?! I mean, Is it possible to add something in the middle of the text file (for example by a Hex editor tools) and make the eof() wrongly returns True in the middle of the text file?
That depends very much on how the operating system implements files. On most modern operating systems, the answer is not just "no" but "No!". The concept of using some special signature that indicates end of file in a disk file is one of many computer science concepts that for the most part have been dumped into the pile of "that wasn't very smart" ideas. You asked your question on the internet. That most likely means you are using a Windows machine, a Linux machine, or a Mac. All of them store the length of a file as metadata separate from the contents of a file.
However, there is a need for the ability to clear the end of file indicator. One program might be writing to a file while at the same time another is reading from it. The reader might hit EOF while the writer is still active. The reader needs to clear the EOF indicator to continue reading what the writer has written. The C++ I/O streams provide the ability to do just that. Every I/O stream has a clear function. Whether it works, that's a different story. The clear will work temporarily, but the very next read might well reset the EOF bit. For example, when I type control-Z on my keyboard, that means I am done interacting with the program, period, My next action might well be to go out for lunch.

C++ IO file streams: writing from one file to another using operator<< and rdbuf()

I have a question about copying data from one file to another in C++ (fstream) using operator<<. Here is a code snippet that works for me:
#include <fstream>
#include <string>
void writeTo(string &fname, ofstream &out){
ifstream in;
in.open(fname.c_str(),fstream::binary);
if(in.good()){
out<<in.rdbuf();
in.close();
}else{
//error
}
}
I would like to be certain that after writing, the end of input file in stream in has been reached. However, if I test for in.eof(), it is false, despite the fact that checking the input and output files confirms that the entire contents has been properly copied over. Any ideas on how I would check for in.eof()?
EOF-bit is set when trying to read a character, but none is available (i.e. you have already consumed everything in the string). Apparently std::ostream::operator<<() does not attempt to read past the end of the string, so the bit is never set.
You should be able to get around this by attempting to access the next character: add in.peek() before you check in.eof(). I have tested this fix and it works.
The reason none of the status bits are set in the input file is because
you are reading through the streambuf, not the istream; the actual
reading takes place in the ostream::operator<<, which doesn't have
access to the istream.
I'm not sure it matters, however. The input will be read until
streambuf::sgetc returns EOF. Which would cause the eofbit to be
set in the istream if you were reading through the istream. The
only thing which might prevent this if you were reading through the
istream is if streambuf::sgetc threw an exception, which would cause
badbit to be set in istream; there is no other mechanism provided
for an input streambuf to report a read error. So wrap your out <<
in.rdbuf() in a try ... catch block, and hope that the implementation
actually does check for hardware errors. (I haven't checked recently,
but a lot of early implementations totally ignored read errors, treating
them as a normal end of file.)
And of course, since you're literally reading bytes (despite the <<, I
don't see how one could call this formatted input), you don't have to
consider the third possible source of errors, a format error (such as
"abc" when inputing an int).
Try in.rdbuf()->sgetc() == EOF.
Reference: http://www.cplusplus.com/reference/iostream/streambuf/sgetc/

std::ios::failue (ios::badbit) problem with fstream.write()

I've been trying to debug my code for the past few hours and I couldn't figure out the problem. I eventually set my filestream to throw exceptions on failbit and I found that my filestream was setting the failbit for some reason. I have absolutely no reason why the failbit is being set, because all I'm doing is writing 2048-byte chunks of data to the stream until suddenly it fails (at the same spot each time).
I would like to show you my code to see if anyone can see a problem and possibly see what might cause a std::ios::failure to be thrown:
bool abstractBlock::encryptBlockRC4(char* key)
{//Thic encryption method can be chunked :)
getStream().seekg(0,std::ios::end);
int sLen = int(getStream().tellg())-this->headerSize;
seekg(0);//Seek to beginning of Data
seekp(0);
char* encryptionChunkBuffer = new char[2048]; //2KB chunk buffer
for (int chunkIterator =0; chunkIterator<sLen; chunkIterator+=2048)
{
if (chunkIterator+2048<=sLen)
{
getStream().read(encryptionChunkBuffer,2048);
char* encryptedData = EnDeCrypt(encryptionChunkBuffer,2048,key);
getStream().write(encryptedData,2048);
free(encryptedData);
}else{
int restLen = sLen-chunkIterator;
getStream().read(encryptionChunkBuffer,restLen);
char* encryptedData = EnDeCrypt(encryptionChunkBuffer,restLen,key);
getStream().write(encryptedData,restLen);
delete encryptedData;
}
}
delete [] encryptionChunkBuffer;
dataFlags |= DATA_ENCRYPTED_RC4; // Set the "encryted (rc4)" bit
seekp(0); //Seek tp beginning of Data
seekg(0); //Seek tp beginning of Data
return true;
}
The above code is essentially encrypting a file using 2048 chunks. It basically reads 2048 bytes, encrypts it and then writes it back to the stream (overwrites the "unencrypted" data that was previously there).
getStream() is simply returning the fstream handle to the file thats being operated on.
The error always occurs when chunkIterator==86116352 on the line getStream().write(encryptedData,2048);
I know my code may be hard to decode, but maybe you can tell me some possible things that might trigger a failbit? Currently, I think that the problem lies within the fact that I an reading/writing to a stream and it may be causing problems, but as I mentioned any ideas that can cause a failbit would maybe help me investigate the problem more.
You must seekp between changing from reads to writes (and vice versa, using seekg). It appears your implementation allows you to avoid this some of the time. (You're probably running into implicit flushes or other buffer manipulation which hide the problem sometimes.)
C++03, §27.5.1p1, Stream buffer requirements
The controlled sequences can impose limitations on how the program can read characters from a
sequence, write characters to a sequence, put characters back into an input sequence, or alter the stream
position.
This just generally states these aspects are controlled by the specific stream buffer.
C++03, §27.8.1.1p2, Class template basic_filebuf
The restrictions on reading and writing a sequence controlled by an object of class
basic_filebuf<charT,traits> are the same as for reading and writing with the Standard C library
FILEs.
fstream, ifstream, and ofstream use filebufs.
C99, §7.19.5.3p6, The fopen function
When a file is opened with update mode ('+' as the second or third character in the above list of mode argument values), both input and output may be performed on the associated stream. However, output shall not be directly followed by input without an intervening call to the fflush function or to a file positioning function (fseek, fsetpos, or rewind), and input shall not be directly followed by output without an intervening call to a file positioning function, unless the input operation encounters end-of-file.
You may need to look up these calls to translate to iostreams terminology, but it is fairly straight-forward.
You sometimes free the result of EnDeCrypt and sometimes delete it (but with single-object delete and not the array form); this most likely doesn't contribute to the problem you see, but it's either an error on your part or, less likely, on the part of the designer of EnDeCrypt.
You're using:
char* encryptionChunkBuffer = new char[2048]; //2KB chunk buffer
//...
getStream().read(encryptionChunkBuffer,2048);
//...
delete[] encryptionChunkBuffer;
But it would be better and easier to use:
vector<char> encryptionChunkBuffer (2048); //2KB chunk buffer
//...
getStream().read(&encryptionChunkBuffer[0], encryptionChunkBuffer.size());
//...
// no delete
If you don't want to type encryptionChunkBuffer.size() twice, then use a local constant for it.