I have a question about the buffering in standard library for I/O:
I read "The Linux Programming Interface" chapter 13 about File I/O buffering, the author mentioned that standard library used I/O buffering for disk file and terminal.
My question is that does this I/O buffering also apply to FIFO, pipe, socket and network file?
Yes, if you're using the FILE * based standard I/O library. The only odd thing that might happen is if the underlying system file descriptor returns non-zero for the isatty function. Then stdio might 'line buffer' both input and output. This means it tends to flush when it sees a '\n'.
I believe that it's required to line buffer stdout if file descriptor 1 returns non-zero for isatty.
No. Anything that's an ordinary file descriptor (such as those returned by open(2), pipe(2), socket(2), and accept(2)) is not buffered—any data you read or write to it is input or output immediately via direct system calls.
Buffering only happens when you have FILE* objects, which you can get by fopen(3)'ing a regular disk file; the objects stdin, stdout, and stderr are also FILE* objects that are setup at program start. Buffering is usually enabled on FILE* objects, but not always—it can be disabled with setbuf(3), and stderr is unbuffered by default.
If you want to create a buffered stream out of a regular file descriptor, you can do so with fdopen(3).
Related
There is FlushFileBuffers() API in Windows to flush buffers till hard drive for a single file. There is sync() API in Linux to flush file buffers for all files.
However, is there WinAPI for flushing all files too, i.e. a sync() analog?
https://learn.microsoft.com/en-us/windows/desktop/api/fileapi/nf-fileapi-flushfilebuffers
It is possible to flush the entire hard drive.
To flush all open files on a volume, call FlushFileBuffers with a handle to the volume. The caller must have administrative privileges. For more information, see Running with Special Privileges.
Also, the same article states the correct procedure to follow if, for some reason, data must be flushed: CreateFile function with the FILE_FLAG_NO_BUFFERING and FILE_FLAG_WRITE_THROUGH flags.
Due to disk caching interactions within the system, the FlushFileBuffers function can be inefficient when used after every write to a disk drive device when many writes are being performed separately. If an application is performing multiple writes to disk and also needs to ensure critical data is written to persistent media, the application should use unbuffered I/O instead of frequently calling FlushFileBuffers. To open a file for unbuffered I/O, call the CreateFile function with the FILE_FLAG_NO_BUFFERING and FILE_FLAG_WRITE_THROUGH flags. This prevents the file contents from being cached and flushes the metadata to disk with each write. For more information, see CreateFile.
But also check the restrictions of file buffering about memory and data alignment.
According to File Management Functions there is no any sync() analog from Linux in WinAPI.
I am creating a class that catches all console output and dumps it into one log. I need this because my program uses many 3rd party libraries that I cannot change. Useful information from these libraries is printed to the console in a handful of ways. I know about replacing the cout/cerr with a custom stream buffer using rdbuf. I don't need help with that. I also know about creating a pipe to capture c-style output, e.g. fprintf( stdout, "Hello, world!" ). However, unlike a custom stream buffer where I can handle output as it comes in, the c-style output is now stuck in this pipe and I have to periodically flush everything and read from it. I would much rather get a notification or install a callback to handle pipe input as it happens.
Qt is in the mix here, too. I've been playing with the QSocketNotifier class, but it doesn't seem to be working the pipe read or write file descriptors.
Suggestions?
output is now stuck in this pipe and I have to periodically flush everything and read from it. I would much rather get a notification or install a callback to handle pipe input as it happens.
It's unclear what "everything" is or why you would need to do more than flush specific file streams, but this sounds like you are referring to the fact that these streams are buffered so the pipes you have connected them to aren't written to until flush conditions are met or flush() is executed.
Further, we don't know whether you are manipulating the layer 3 file streams or the layer 2 file descriptors. We don't know whether you've disabled synchronization between C++ streams and layer 3 streams.
All that said, it is possible to disable the C layer 3 buffering with
setvbuf(stdout, NULL, _IONBF, 0);
setvbuf(stderr, NULL, _IONBF, 0);
This means you won't have to flush() any more for, say, fprintf() calls to be written to the pipes.
For that, you can set up a poll/select call to check for data on the pipes or you can simply have threads performing blocking reads from them and transfer the data someplace else.
On Linux, we use to redirect the standard streams:
freopen (outfile, "a", stdout);
freopen (outfile, "a", stderr);
I don't believe there's any way to get a notification.
Are standard input and standard output independent or not?
Consider a parent program had launched a child, and the parent's standard output was attached to the child's standard input, and the child's standard output was attached to the parent's standard input.
stdin <- stdout
parent child
stdout -> stdin
If the child (asynchronously) continually read from its standard input and wrote data to its standard output, but the parent just wrote to the child's standard input and didn't read from the child's standard output at all:
stdin| << stdout
parent child
stdout ==>==> stdin
would there eventually be a blockage? Do standard input and standard output share a buffer of any kind? Specifically via C++ std::cin (istream) and std::cout (ostream) if that's needed to answer. Does the standard require they do or do not share such a thing, or does it leave it up to the implementation?
What would happen?
You can't "attach" a file descriptor from a process to a file descriptor of a different process. What you do (if your operating system supports it) is to assign the two file descriptors to the ends of a "pipe". Pipes are not specified anywhere in the C/C++ standard (they are defined by POSIX), and you won't find any standard C/C++ library function which makes any reference to them at all.
As implemented by Unix (and Unix-like) systems, a pipe is little more than a buffer somewhere in the operating system. While the buffer is not full, a process can write data to the input end of the pipe; the data is simply added to the buffer. While the buffer is not empty, a process can read data from the output end of the buffer; the data is removed from the buffer and handed off to the reading process. If a process tries to write to a pipe whose buffer is full or read from a pipe whose buffer is empty, the process "blocks": that is, it is marked by the kernel scheduler as not runnable, and it stays in that state until the pipe can handle its request.
The scenario described in the question needs to involve two pipes. One pipe is used to allow the parent's stdout to send data to the child's stdin, and the other is used to allow the child's stdout to send data to the parent's stdin. These two pipes are wholly independent of each other.
Now, if the parent stops reading from its stdin, but the child continues writing to its stdout, then eventually the pipe buffer will become full. (It actually won't take very long. Pipe buffers are not very big, and they don't grow.) At that point, the child will block trying to write to the pipe. If the child is not multithreaded, then once it blocks, that's it. It stops running, so it won't read from its stdin any more. And if the child stops reading from its stdin, then the other pipe will soon become full and the parent will also block trying to write to its stdout.
So there's no requirement that resources be shared in order to achieve deadlock.
This is a very well-known bug in processes which spawn a child and try to feed data to the child while reading the child's response. If the reader does not keep up with the data produced, then deadlock is likely. You'll find lots of information about it by searching for, for example, "pipe buffer deadlock". Here are a few sample links, just at random:
Raymond Chen, on MSDN: http://blogs.msdn.com/b/oldnewthing/archive/2011/07/07/10183884.aspx
Right here on StackOverflow (with reference to Python but the issue is identical): Can someone explain pipe buffer deadlock?
David Glasser, from 2006: http://web.mit.edu/6.033/2006/wwwdocs/writing-samples/unix-DavidGlasser.html ("These limitations are not merely theoretical — they can be seen in practice by the fact that no major form of inter-process communication later developed in Unix is layered on top of pipe.")
Yes, I can't. It seems weird ostream has no close, since istream can detect end of file.
Here's my situation: I am capturing all the output on Posix fd2, in this process, and its children, by creating a pipe and dup2'ing the pipe output end onto fd2. A thread then reads the read end of the pipe using an associated C stream (and happens to write each line with a timestamp to the original fd2 via another associated C stream).
When all the children are dead, I write a closing message to cerr, then I need to close it so the thread echoing it to the original error file will close the pipe and terminate.
The thread is not detecting eof(), even though I am closing both stderr and fd2.
I have duplicated my main program using a simple one, and using C streams instead of C++ iostreams, and everything works just fine by fclosing stderr (there are no child processes in that simplified test though).
Edit: hmm .. do I need to close the original pipe fd after dup2'ing it onto channel 2? I didn't do that, so the underlying pipe still has an open fd attached. Aha .. that's the answer!
When you duplicate a file descriptor with dup2 the original descriptor remains a valid reference to the underlying file. The file won't be closed and the associated resources freed until all file descriptors associated with a particular file are closed (with close).
If you are using dup2 to copy a file descriptor to a well known number (such as 2 for stderr), you usually want to call close on the original file descriptor immediately after a successful dup2.
The streams used for the standard C++ streams are the same as those controlled by the corresponding stdio files. That is, if you fclose(stderr) you also close the stream used for std::cerr. ... and since you seem to play with the various dup() functions you can also close(2) to close this stream.
The best is to put a wrapper around your resource and then have the destructor close it when it goes out of scope. The the presention from Bjarne Stoustup
I am programming a shell in c++. It needs to be able to pipe the output from one thing to another. For example, in linux, you can pipe a textfile to more by doing cat textfile | more.
My function to pipe one thing to another is declared like this:
void pipeinput(string input, string output);
I send "cat textfile" as the input, and "more" as the output.
In c++ examples that show how to make pipes, fopen() is used. What do I send as my input to fopen()? I have seen c++ examples of pipeing using dup2 and without suing dup2. What's dup2 used for? How do you know if you need to use it or not?
Take a look at popen(3), which is a way to avoid execvp.
For a simple, two-command pipeline, the function interface you propose may be sufficient. For the general case of an N-stage pipeline, I don't think it is flexible enough.
The pipe() system call is used to create a pipe. In context, you will be creating one pipe before forking. One of the two processes will arrange for the write end of the pipe to become its standard output (probably using dup2()), and will then close both of the file descriptors originally returned by pipe(). It will then execute the command that writes to the pipe (cat textfile in your example). The other process will arrange for the read enc of the pipe to become its standard input (probably using dup2() again), and will then close both of the file descriptor originally returned by pipe(). It will then execute the command that reads from the pipe (more in your example).
Of course, there will be still a third process around - the parent shell process - which forked off a child to run the entire pipeline. You might decide you want to refine the mechanisms a bit if you want to track the statuses of each process in the pipeline; the process organization is then a bit different.
fopen() is not used to create pipes. It can be used to open the file descriptor, but it is not necessary to do so.
Pipes are created with the pipe(2) call, before forking off the process. The subprocess has a little bit of file descriptor management to do before execing the command. See the example in pipe's documentation.