EOF before EOF in Visual Studio - c++

I had this snippet in a program (in Visual Studio 2005):
if(_eof(fp->_file))
{
break;
}
It broke the enclosing loop when eof was reached. But the program was not able to parse the last few thousand chars in file. So, in order to find out what was happening, I did this:
if(_eof(fp->_file))
{
cout<<ftell(fp)<<endl;
break;
}
Now the answer that I got from ftell was different (and smaller) than the actual file-size (which isn't expected). I thought that Windows might have some problem with the file, then I did this:
if(_eof(fp->_file))
{
cout<<ftell(fp)<<endl;
fseek(fp, 0 , SEEK_END);
cout<<ftell(fp)<<endl;
break;
}
Well, the fseek() gave the right answer (equal to the file-size) and the initial ftell() failed (as previously told).
Any idea about what could be wrong here?
EDIT: The file is open in "rb" mode.

You can't reliably use _eof() on a file descriptor obtained from a FILE*, because FILE* streams are buffered. It means that fp has sucked fp->_file dry and stores the remaining byte in its internal buffer. Eventually fp->_file is at eof position, while fp still has bytes for you to read. Use feof() after a read operation to determine if you are at the end of a file and be careful if you mix functions which operate on FILE* with those operating on integer file descriptors.

You should not be using _eof() directly on the descriptor if your file I/O operations are on the FILE stream that wraps it. There is buffering that takes place and the underlying descriptor will hit end-of-file before your application has read all the data from the FILE stream.
In this case, ftell(fp) is reporting the state of the stream and you should be using feof(fp) to keep them in the same I/O domain.

Related

Correct way of using fdopen

I mean to associate a file descriptor with a file pointer and use that for writing.
I put together program io.cc below:
int main() {
ssize_t nbytes;
const int fd = 3;
char c[100] = "Testing\n";
nbytes = write(fd, (void *) c, strlen(c)); // Line #1
FILE * fp = fdopen(fd, "a");
fprintf(fp, "Writing to file descriptor %d\n", fd);
cout << "Testing alternate writing to stdout and to another fd" << endl;
fprintf(fp, "Writing again to file descriptor %d\n", fd);
close(fd); // Line #2
return 0;
}
I can alternately comment lines 1 and/or 2, compile/run
./io 3> io_redirect.txt
and check the contents of io_redirect.txt.
Whenever line 1 is not commented, it produces in io_redirect.txt the expected line Testing\n.
If line 2 is commented, I get the expected lines
Writing to file descriptor 3
Writing again to file descriptor 3
in io_redirect.txt.
But if it is not commented, those lines do not show up in io_redirect.txt.
Why is that?
What is the correct way of using fdopen?
NOTE.
This seems to be the right approach for a (partial) answer to Smart-write to arbitrary file descriptor from C/C++
I say "partial" since I would be able to use C-style fprintf.
I still would like to also use C++-style stream<<.
EDIT:
I was forgetting about fclose(fp).
That "closes" part of the question.
Why is that?
The opened stream ("stream" is an opened FILE*) is block buffered, so nothing gets written to the destination before the file is flushed. Exiting from an application closes all open streams, which flushes the stream.
Because you close the underlying file descriptor before flushing the stream, the behavior of your program is undefined. I would really recommend you to read posix 2.5.1 Interaction of File Descriptors and Standard I/O Streams (which is written in a horrible language, nonetheless), from which:
... if two or more handles are used, and any one of them is a stream, the application shall ensure that their actions are coordinated as described below. If this is not done, the result is undefined.
...
For the first handle, the first applicable condition below applies. ...
...
If it is a stream which is open for writing or appending (but not also open for reading), the application shall either perform an fflush(), or the stream shall be closed.
A "handle" is a file descriptor or a stream. An "active handle" is the last handle that you did something with.
The fp stream is the active handle that is open for appending to file descriptor 3. Because fp is an active handle and is not flushed and you switch the active handle to fd with close(fd), the behavior of your program is undefined.
What is my guess and most probably happens is that your C standard library implementation calls fflush(fp) after main returns, because fd is closed, some internal write(3, ...) call returns an error and nothing is written to the output.
What is the correct way of using fdopen?
The usage you presented is the correct way of using fdopen.

Reading a Potentially incomplete File C++

I am writing a program to reformat a DNS log file for insertion to a database. There is a possibility that the line currently being written to in the log file is incomplete. If it is, I would like to discard it.
I started off believing that the eof function might be a good fit for my application, however I noticed a lot of programmers dissuading the use of the eof function. I have also noticed that the feof function seems to be quite similar.
Any suggestions/explanations that you guys could provide about the side effects of these functions would be most appreciated, as would any suggestions for more appropriate methods!
Edit: I currently am using the istream::peek function in order to skip over the last line, regardless of whether it is complete or not. While acceptable, a solution that determines whether the last line is complete would be preferred.
The specific comparison I'm using is: logFile.peek() != EOF
I would consider using
int fseek ( FILE * stream, long int offset, int origin );
with SEEK_END
and then
long int ftell ( FILE * stream );
to determine the number of bytes in the file, and therefore - where it ends. I have found this to be more reliable in detecting the end of the file (in bytes).
Could you detect an (End of Record/Line) EOR marker (CRLF perhaps) in the last two or three bytes of the file? (3 bytes might be used for CRLF^Z...depends on the file type). This would verify if you have a complete last row
fseek (stream, -2,SEEK_END);
fread (2 bytes... etc
If you try to open the file with exclusive locks, you can detect (by the failure to open) that the file is in use, and try again in a second...(or whenever)
If you need to capture the file contents as the file is being written, it's much easier if you eliminate as many layers of indirection and buffering between your logic and the actual bytes of data in the file.
Do not use C++ IO streams of any type - you have no real control over them. Don't use FILE *-based functions such as fopen() and fread() - those are buffered, and even if you disable buffering there are layers of code between your code and the data that once again you can't control and don't know what's happening.
In a POSIX environment, you can use low-level C-style open() and read()/pread() calls. And use fstat() to know when the file contents have changed - you'll see the st_size member of the struct stat argument change after a call to fstat().
You'd open the file like this:
int logFileFD = open( "/some/file/name.log", O_RDONLY );
Inside a loop, you could do something like this (error checking and actual data processing omitted):
size_t lastSize = 0;
while ( !done )
{
struct stat statBuf;
fstat( logFileFD, &statBuf );
if ( statBuf.st_size == lastSize )
{
sleep( 1 ); // or however long you want
continue; // go to next loop iteration
}
// process new data - might need to keep some of the old data
// around to handle lines that cross boundaries
processNewContents( logFileFD, lastSize, statBuf.st_size );
}
processNewContents() could look something like this:
void processNewContents( int fd, size_t start, size_t end )
{
static char oldData[ BUFSIZE ];
static char newData[ BUFSIZE ];
// assumes amount of data will fit in newData...
ssize_t bytesRead = pread( fd, newData, start, end - start );
// process the data that was read read here
return;
}
You may also find that you need to add some code to close() then re-open() the file in case your application doesn't seem to be "seeing" data written to the file. I've seen that happen on some systems - the application somehow sees a cached copy of the file size somewhere while an ls run in another context gets the more accurate, updated size. If, for example, you know your log file is written to every 10-15 seconds, if you go 30 seconds without seeing any change to the file you know to try reopening the file.
You can also track the inode number in the struct stat results to catch log file rotation.
In a non-POSIX environment, you can replace open(), fstat() and pread() calls with the low-level OS equivalent, although Windows provides most of what you'd need. On Windows, lseek() followed by read() would replace pread().

C++ read() problems

I'm having trouble reading a large file into my own buffer in C++ in Visual Studio 2010. Below is a snippet of my code where length is the size of the file I'm reading in, bytesRead is set to 0 before this is run, and file is a std::ifstream.
buffer = new char[length];
while( bytesRead < length ){
file.read( buffer + bytesRead, length - bytesRead );
bytesRead += file.gcount();
}
file.close();
I noticed that gcount() returns 0 at the second read and onwards, meaning that read() did not give me any new characters so this is an infinite loop. I would like to continue to read the rest of the file. I know that the eofbit is set after the first read even though there is more data in the file.
I do not know what I can do to read more. Please help.
Make sure you open your file in binary mode (std::ios::binary) to avoid any newline conversions. Using text mode could invalidate your assumption that the length of the file is the number of bytes that you can read from the file.
In any case, it is good practice to examine the state of the stream after the read and stop if there has been an error (rather can continue indefinitely).
It sounds like your stream is in a failed state, at which point most operations will just immediately fail. You'll need to clear the stream to continue reading.

Does constructing an iostream (c++) read data from the hard drive into memory?

When I construct an iostream when say opening a file will this always read the entire file from the hard disk and then put it into memory, or is it streamed in and buffered by the OS on demand?
I ask because one way to check if a file exists is to see if opening it fails, but I fear if the files I am opening are very large then this take a long time if iostream must read the entire file in on open.
To check whether a file exists can be done like this if you want to use boost.
#include <boost/filesystem.hpp>
bool fileExists = boost::filesystem::exists("foo.txt");
No, it will not read the entire file into memory when you open it. It will read your file in chunks though, but I believe this process will not start until you read the first byte. Also these chunks are relatively small (on the order of 4-128 kibibytes in size), and the fact it does this will speed things up greatly if you are reading the file sequentially.
In a test on my Linux box (well, Linux VM) simply opening the file only results in the OS open system call, but no read system call. It doesn't start reading anything from the file until the first attempt to read from the stream. And then it reads 8191 (why 8191? that seems a very strange number) byte chunks as I read the file in.
Opening a file is a bad way of testing if the file exists - all it does is tell you if you can open it. Opening might fail for a number of reasons, typically because you don't have read permission, but the file will still exist. It is usually better to use an operating system specific function to test for existence. And no, opening an fstream will not cause the contents to be read.
What I think is, when you open a file, the corresponding data structures for the process opening the file are populated which include file pointer, file descriptor, v node etc.
Now one can read and write to a file using buffered streams (fwrite , fread) or using system calls (read and write).
When we use buffered streams, we buffer the data and then write or read it[This is done for efficiency puposes]. This statement itself means that the whole file is not read into memory but certain bytes are read into buffer and then made available.
In case of sys calls such as read and write , kernel level buffering is done (using fsync one can flush out kernel buffer too), but data is actually read and written to the device .file
checking existance of file
#include &lt sys/stat.h &gt
int main(){
struct stat file_i;
std::string f("myfile.txt");
if (stat(f.c_str(),&file_i) != 0){
cout &lt&lt "File not found" &lt&lt endl;
}
return 0;
}
Hope this clarifies a bit.

feof() returning true when EOF is not reached

I'm trying to read from a file at a specific offset (simplified version):
typedef unsigned char u8;
FILE *data_fp = fopen("C:\\some_file.dat", "r");
fseek(data_fp, 0x004d0a68, SEEK_SET); // move filepointer to offset
u8 *data = new u8[0x3F0];
fread(data, 0x3F0, 1, data_fp);
delete[] data;
fclose(data_fp);
The problem becomes, that data will not contain 1008 bytes, but 529 (seems random). When it reaches 529 bytes, calls to feof(data_fp) will start returning true.
I've also tried to read in smaller chunks (8 bytes at a time) but it just looks like it's hitting EOF when it's not there yet.
A simple look in a hex editor shows there are plenty of bytes left.
Opening a file in text mode, like you're doing, makes the library translate some of the file contents to other stuff, potentially triggering a unwarranted EOF or bad offset calculations.
Open the file in binary mode by passing the "b" option to the fopen call
fopen(filename, "rb");
Is the file being written to in parallel by some other application? Perhaps there's a race condition, so that the file ends at wherever the read stops, when the read is running, but later when you inspect it the rest has been written. That would explain the randomness, too.
Maybe it's a difference between textual and binary file. If you're on Windows, newlines are CRLF, which is two characters in file, but converted to only one when read. Try using fopen(..., "rb")
I can't see your link from work, but if your computer claims no more bytes exist, I'd tend to believe it. Why don't you print the size of the file rather than doing things by hand in a hex editor?
Also, you'd be better off using level 2 I/O the f-calls are ancient C ugliness, and you're using C++ since you have new.
int fh =open(filename, O_RDONLY);
struct stat s;
fstat(fh, s);
cout << "size=" << hex << s.st_size << "\n";
Now do your seeking and reading using level 2 I/O calls, which are faster anyway, and let's see what the size of the file really is.