Related
It seems well accepted that the istream::peek operation is blocking.
The standard, though arguably a bit ambiguous, leans towards nonblocking behavior. peek calls sgetc in turn, whose behavior is:
"The character at the current position of the controlled input sequence, as a value of type int.
If there are no more characters to read from the controlled input sequence, the function returns the end-of-file value (EOF)."
It doesn't say "If there are no more characters.......wait until there are"
Am I missing something here? Or are the peek implementations we use just kinda wrong?
The controlled input sequence is the file (or whatever) from which you're reading. So if you're at end of file, it returns EOF. Otherwise it returns the next character from the file.
I see nothing here that's ambiguous at all--if it needs a character that hasn't been read from the file, then it needs to read it (and wait till it's read, and return it).
If you're reading from something like a socket, then it's going to wait until data arrives (or the network stack detects EOF, such as the peer disconnecting).
The description from cppreference.com might be clearer than the one in your question:
Ensures that at least one character is available in the input area by [...] reading more data in from the input sequence (if applicable)."
"if applicable" does apply in this case; and "reading data from the input sequence" entails waiting for more data if there is none and the stream is not in an EOF or other error state.
When I get confused about console input I remind myself that console input can be redirected to come from a file, so the behavior of the keyboard more or less mimics the behavior of a file. When you try to read a character from file, you can get one of two results: you get a character, or you get EOF because you've reached the end of the file -- there are no more characters to be read. Same thing for keyboard input: either you get a character, or you get EOF because you've reached the end of the file. With a file, there is no notion of waiting for more characters: either a file has unread characters or it doesn't. Same thing for the keyboard. So if you have't reached EOF on the keyboard, reading a character returns the next character. You reach EOF on the keyboard by typing whatever character your system recognizes as EOF; on Unix systems that's ctrl-D, on Windows (if I remember correctly) that's ctrl-C. If you haven't reached EOF, there are more characters to be read.
I'm a beginner in C++ and trying to better understand feof(). I've read that feof() flag is set to true only after trying to read past the end of a file so many times beginners will read once more than they were expecting if they do something like while(!feof(file)). What I'm trying to understand though, is how does it actually interpret that an attempt has been made to read past the end of the file? Is the entire file already read in and the number of characters already known or is there some other mechanism at work?
I realize this may be a duplicate question somewhere, but I've been unable to find it, probably because I don't know the best way to word what I'm asking. If there is an answer already out there a link would be much appreciated. Thanks.
Whatever else the C++ library does, eventually it has to read from the file. Somewhere in the operating system, there is a piece of code that eventually handles that read. It obtains from the filesystem the length of the file, stored the same way the filesystem stores everything else. Knowing the length of the file, the position of the read, and the number of bytes to be read, it can make the determination that the low-level read hits the end of the file.
When that determination is made, it is passed up the stack. Eventually, it gets to the standard library which records internally that the end of file has been reached. When a read request into the library tries to go past that recorded end, the EOF flag is set and feof will start returning true.
feof() is a part of the standard C library buffered I/O. Since it's buffered, fread() pre-reads some data (definitely not the whole file, though). If, while buffering, fread() detects EOF (the underlying OS routine returns a special value, usually -1), it sets a flag on the FILE structure. feof() simply checks for that flag. So feof() returning true essentially means “a previous read attempt encountered end of file”.
How EOF is detected is OS/FS-specific and has nothing to do whatsoever with the C library/language. The OS has some interface to read data from files. The C library is just a bridge between the OS and the program, so you don't have to change your program if you move to another OS. The OS knows how the files are stored in its filesystem, so it knows how to detect EOF. My guess is that typically it is performed by comparing the current position to the length of the file, but it may be not that easy and may involve a lot of low-level details (for example, what if the file is on a network drive?).
An interesting question is what happens when the stream is at the end, but it was not yet detected by any reads. For example, if you open an empty file. Does the first call to feof() before any fread() return true or false? The answer is probably false. The docs aren't terribly clear on this subject:
This indicator is generally set by a previous operation on the stream
that attempted to read at or past the end-of-file.
It sounds as if a particular implementation may choose some other unusual ways to set this flag.
Most file system maintain meta information about the file (including it's size), and an attempt to read past the end of results in the feof flag being set. Others, for instance, old or lightweight file systems, set feof when they come to the last byte of the last block in the chain.
How does feof() actually know when the end of file is reached?
When code attempts to read passed the last character.
Depending on the file type, the last character is not necessarily known until a attempt to read past it occurs and no character is available.
Sample code demonstrating feof() going from 0 to 1
#include <stdio.h>
void ftest(int n) {
FILE *ostream = fopen("tmp.txt", "w");
if (ostream) {
while (n--) {
fputc('x', ostream);
}
fclose(ostream);
}
FILE *istream = fopen("tmp.txt", "r");
if (istream) {
char buf[10];
printf("feof() %d\n", feof(istream));
printf("fread %zu\n", fread(buf, 1, 10, istream));
printf("feof() %d\n", feof(istream));
printf("fread %zu\n", fread(buf, 1, 10, istream));
printf("feof() %d\n", feof(istream));
puts("");
fclose(istream);
}
}
int main(void) {
ftest(9);
ftest(10);
return 0;
}
Output
feof() 0
fread 9 // 10 character read attempted, 9 were read
feof() 1 // eof is set as previous read attempted to read passed the 9th or last char
fread 0
feof() 1
feof() 0
fread 10 // 10 character read attempted, 10 were read
feof() 0 // eof is still clear as no attempt to read passed the 10th, last char
fread 0
feof() 1
The feof() function sets the end of file indicator when the EOF character is read. So when feof() reads the last item, the EOF is not read along with it at first. Since no EOF indicator is set and feof() returns zero, the flow enters the while loop again. This time fgets comes to know that the next character is EOF, its discards it and returns NULL but also sets the EOF indicator. So feof() detects the end of file indicator and returns a non-zero value therefore breaking the while loop.
I am a little confused with function fsetpos in the stdio.h library. I want to be to write to different indexes (i.e do not want to write to a file contiguously) in a file. I was considering using fsetpos however the documentation states..
The internal file position indicator associated with stream is set to the position
represented by pos, which is a pointer to an fpos_t object whose value shall have been
previously obtained by a call to fgetpos.
It does not make sense to me that I have to set the position based on the call from fgetpos. Whats the point since it will just set it to the position it is already set at. Or I am I not understanding it correctly ?
From the C11 standard, fseek has a similar limitation:
For a text stream, either offset shall be zero, or offset shall be a value returned by an earlier successful call to the ftell function on a stream associated with the same file and whence shall be SEEK_SET
The reason is that text streams don't have a one-to-one mapping between the actual bytes of the source and the bytes you would get from fgetc; e.g. on windows systems, the newline character in C tends to be translated into a sequence of two binary characters: carriage return, then line feed.
Consequently, the notion of arbitrarily positioning a text stream based on a numerical index is fraught with complications and surprises.
In fact, the documentation of ftell warns
For a text stream, its file position indicator contains unspecified information, usable by the fseek function for returning the file position indicator for the stream to its position at the time of the ftell call; the difference between two such return values is not necessarily a meaningful measure of the number of characters written or read.
Binary streams don't have this limitation, although
A binary stream need not meaningfully support fseek calls with a whence value of SEEK_END
The above assumes you are working with byte-oriented streams. Wide-oriented streams have additional restrictions. e.g. under Streams:
Binary wide-oriented streams have the file-positioning restrictions ascribed to both text and binary streams
and
For wide-oriented streams, after a successful call to a file-positioning function that leaves the file position indicator prior to the end-of-file, a wide character output function can overwrite a partial multibyte character; any file contents beyond the byte(s) written are henceforth indeterminate
fsetpos does more than just set the file position: again from the C11 standard:
The fsetpos function sets the mbstate_t object (if any) and file position indicator
which makes it more suitable for setting the position in a wide-oriented streams.
DISCLAIMER: Don't use ::feof() as your loop condition. For example, see the answer to: file reading: feof() for binary files
However, I have "real" code that demonstrates a problem that does not use ::feof() as my loop condition, but LOGICALLY, this is the easiest way to demonstrate the problem.
Consider the following: We iterate a character stream one-at-a-time:
FILE* my_file;
// ...open "my_file" for reading...
int c;
while(0 == ::feof(my_file))
{ // We are not at EOF
c = ::getc(my_file);
// ...process "c"
}
The above code works as expected: The file is processed one-char-at-a-time, and upon EOF, we drop out.
HOWEVER, the following has unexpected behavior:
FILE* my_file;
// ...open "my_file" for reading...
int c;
while(0 == ::_eof(::fileno(my_file)))
{ // We are not at EOF
c = ::getc(my_file);
// ...process "c"
}
I would have expected them to perform the same. ::fileno() properly returns the (integer) file descriptor every time. However, the test ::_eof(::fileno(my_file)) works exactly once, and then returns 1 (indicating an EOF) on the second attempt.
I do not understand this.
I suppose it is conceivable that ::feof() is "buffered" (so it works correctly) while ::_eof() is "un-buffered" and thinks the whole file is "read-in" already (because the whole file would have fit into the first block read in from disk). However, that can't possibly be true given the purpose of those functions. So, I'm really at a loss.
What's going on?
(Files are opened as "text", are ASCII text files with about a dozen lines, MSVS2008, Win7/64.)
I suppose it is conceivable that ::feof() is "buffered" (so it works
correctly) while ::_eof() is "un-buffered" and thinks the whole file
is "read-in" already (because the whole file would have fit into the
first block read in from disk). However, that can't possibly be true
given the purpose of those functions. So, I'm really at a loss.
I don't know why you would think it "can't possibly be true given the purpose of those functions." The 2 functions are meant to operate on files that are opened and operated on in different ways, so they are not compatible.
In fact, that is exactly what is happening. Try this:
FILE* my_file;
// ...open "my_file" for reading...
int c;
while(0 == ::_eof(::fileno(my_file)))
{ // We are not at EOF
c = ::getc(my_file);
long offset1 = ftell(my_file);
long offset2 = _tell(fileno(my_file));
if (offset1 != offset2)
{
//here you will see that the file pointers are different
//which means that _eof and feof will fire true under different conditions
}
// ...process "c"
}
I will try to elaborate a bit based on your comment.
When you call fopen, you are getting back a pointer to a file stream. The underlying stream object keeps it's own file pointer which is separate from the actual file pointer associated with the underlying file descriptor.
When you call _eof you are asking if you have reached the end of the actual file. When you call feof, you are asking if you have reached the end of the file stream. Since file streams are usually buffered, the end of the file is reached before the end of the stream.
I'm still trying to understand your answer below, and what the purpose
is for _eof() if it always returns 1 even when you didn't read
anything (after the first char).
To answer this question, the purpose of _eof is to determine if you have reached the end of the file when using _open and _read to work directly with file descriptors, not when you use fopen and fread or getc to work with file streams.
Most of the languages like C++ when writing into a file, put an EOF character even if we miss to write statements like :
filestream.close
However is there any way, we can put the EOF character according to our requirement, in C++, for an instance.
Or any other method we may use apart from using the functions provided in C++.
If you need to ask more of information then kindly do give a comment.
EDIT:
What if, we want to trick the OS and place an EOF character in a file and write some data after the EOF so that an application like notepad.exe is not able to read after our EOF character.
I have read answers to the question related to this topic and have come to know that nowdays OS generally don't see for an EOF character rather check the length of file to get the correct idea of knowing about the length of the file but, there must be a procedure in OS which would be checking the length of file and then updating the file records.
I am sorry if I am wrong at any point in my estimation but please do help me because it can lead to a lot of new ideas.
There is no EOF character. EOF by definition "is unequal to any valid character code". Often it is -1. It is not written into the file at any point.
There is a historical EOF character value (CTRL+Z) in DOS, but it is obsolete these days.
To answer the follow-up question of Apoorv: The OS never uses the file data to determine file length (files are not 'null terminated' in any way). So you cannot trick the OS. Perhaps old, stupid programs won't read after CTRL+Z character. I wouldn't assume that any Windows application (even Notepad) would do that. My guess is that it would be easier to trick them with a null (\0) character.
Well, EOF is just a value returned by the function defined in the C stdio.h header file. Its actually returned to all the reading functions by the OS, so its system dependent. When OS reaches the end of file, it sends it to the function, which in its return value than places most commonly (-1), but not always. So, to summarize, EOF is not character, but constant returned by the OS.
EDIT: Well, you need to know more about filesystem, look at this.
Hi, to your second question:
once again, you should look better into filesystems. FAT is a very nice example because you can find many articles about it, and its principles are very similar to NTFS. Anyway, once again, EOF is NOT a character. You cannot place it in file directly. If you could do so, imagine the consequences, even "dumb" image file could not be read by the system.
Why? Because OS works like very complex structure of layers. One of the layers is the filesystem driver. It makes sure that it transfers data from every filesystem known to the driver. It provides a bridge between applications and the actual system of storing files into HDD.
To be exact, FAT filesystem uses the so-called FAT table - it is a table located close to the start of the HDD (or partition) address space, and it contains map of all clusters (little storage cells). OK, so now, when you want to save some file to the HDD, OS (filesystem driver) looks into FAT table, and searches for the value "0x0". This "0x0" value says to the OS that cluster which address is described by the location of that value in FAT table is free to write.
So it writes into it the first part of the file. Then, it looks for another "0x0" value in FAT, and if found, it writes the second part of the file into cluster which it points to. Then, it changes the value of the first FAT table record where the file is located to the physical address of the next in our case second part of the file.
When your file is all stored on HDD, now there comes the final part, it writes desired EOF value, but into FAT table, not into the "data part" of the HDD. So when the file is read next time, it knows this is the end, don´t look any further.
So, now you see, if you would want to manually write EOF value into the place it doesn't belong to, you have to write your own driver which would be able to rewrite the FAT record, but this is practically impossible to do for beginners.
I came here while going through the Kernighan & Ritchie C exercises.
Ctrl+D sends the character that matches the EOF constant from stdio.h.
(Edit: this is on Mac OS X; thanks to #markmnl for pointing out that the Windows 10 equivalent is Ctrl+Z)
Actually in C++ there is no physical EOF character written to a file using either the fprintf() or ostream mechanisms. EOF is an I/O condition to indicate no more data to read.
Some early disk operating systems like CP/M actually did use a physical 0x1A (ASCII SUB character) to indicate EOF because the file system only maintained file size in blocks so you never knew exactly how long a file was in bytes. With the advent of storing actual length counts in the directory it is no longer typical to store an "EOF" character as part of the 'in-band' file data.
Under Windows, if you encounter an ASCII 26 (EOF) in stdin, it will stop reading the rest of the data. I believe writing this character will also terminate output sent to stdout, but I haven't confirmed this. You can switch the stream to binary mode as in this SO question:
#include <io.h>
#include <fcntl.h>
...
_setmode(0, _O_BINARY)
And not only will you stop 0x0A being converted to 0x0D 0x0A, but you'll also gain the ability to read/write 0x1A as well. Note you may have to switch both stdin (0) and stdout (1).
If by the EOF character you mean something like Control-Z, then modern operating systems don't need such a thing, and the C++ runtime will not write one for you. You can of course write one yourself:
filestream.put( 26 ); // write Ctrl-Z
but there is no good reason to do so. There is also no need to do:
filesystem.close();
as the file stream will be closed for you automatically when its destructor is called, but it is (I think) good practice to do so.
There is no such thing as the "EOF" character. The fact of closing the stream in itself is the "EOF" condition.
When you press Ctrl+D in a unix shell, that simply closes the standard input stream, which in turn is recognized by the shell as "EOF" and it exits.
So, to "send" an "EOF", just close the stream to which the "EOF" needs to be sent.
Nobody has yet mentioned the [f]truncate system calls, which are how you make a file shorter without recreating it from scratch.
The truncate() and ftruncate() functions cause the regular file named by path or referenced by fd to be truncated to a size of precisely length bytes.
If the file previously was larger than this size, the extra data is lost. If the file previously was shorter, it is extended, and the extended part reads as null bytes ('\0').
Understand that this is a distinct operation from writing any sort of data to the file. The file is a linear array of bytes, laid out on disk somehow, with metadata that says how long it is; truncate changes the metadata.
On modern filesystems EOF is not a character, so you don't have to issue it when finishing to write to a file. You just have to close the file or let the OS do it for you when your process terminates.
Yes, you can manually add EOF to a file.
1) in Mac terminan, create a new file. touch filename.txt
2) Open the file in VI
vi filename.txt
3) In Insert mode (hit i), type Control+V and then Control+D. Do not let go of the Control key on the Mac.
Alternatively, if I want other ^NewLetters, like ^N^M^O^P, etc, I could do Contorl+V and then Control+NewLetter. So for example, to do ^O, hold down control, and then type V and O, then let go of Control.