(no semaphores, or threading, just processes)
I want to read data from a file in parent and pass it to child through pipe.
Suppose data in file is
Is
This
Possible?
now after reading "Is" through pipe
How would child know that new data "This" has been passed and should be read
What would be the terminating condition after reading "Possible?" through pipe, so that child can terminate after reading all the data Parent wanted to pass
(Doing it without using semaphores or threads, just plain processes i-e forking)
Thanks in Advance
A parent writing to a file and the child reading from it would require the synchronization you're thinking of. That is, if parent has only written the 1st line and child has read it, but parent has not written line 2, child will get a premature EOF.
But, a pipe does not.
A pipe stays open until the parent/sender closes it [or child terminates]. So, the child can just read in a loop until it receives EOF.
The child will automatically block in the read if no data is available but will not get EOF prematurely. If you want, the child can do select(2) or poll(2) to check for data being available but I hardly think that's necessary.
The child will not get EOF until the parent has sent all the data and closed its end of the pipe.
So, no synchronization is needed.
On the other side, we may have a parent that sends lots of data quickly and the child is reading slowly (i.e.) falls behind a bit. Eventually, the [kernel] pipe buffer gets "full" and the parent write will block until the child has been able to "catch up" and drain some of the data. Thus, no data is "lost".
You can simply read a fixed amount of data using fairly ubiquitous read() or fread() API, and use the fact that these will either block until more data is available or signal end-of-file conditions. This is the most straightforward way to pipe data into child processes: the child simply reads from stdin like an ordinary file, until it encounters the 'end-of-file' condition.
Alternatively, for a more responsive/performant design (or when dealing with hardware that signals on file objects) you need:
Enable support for nonblocking I/O
Monitor your input file descriptor/handle (the read side of the pipe) using select() like API
Integrate with an event loop (or write a simple one yourself)
Be able to deal with unexpected amounts of data available.
Be able to deal with errors from read()/fread() and friends, in particular EAGAIN.
And be able to deal with end-of-file conditions.
Wiring this up means delving into OS specific APIs, but fortunately it's also a common enough task that plenty of toolkit libraries (e.g. libuv, Qt) exist and provide consistent abstraction/higher level API.
Related
In linux, how does one generate an event to break out a select / poll / epoll loop on thread termination? Processes have a pidfd and SIGCHILD. Is there something similar for threads?
Edit: this is to directly monitor the thread termination event.
Well the most obvious solution, that comes to mind, is that one of the file descriptors being polled/selected for would be a very special file descriptor, that's reserved for that particular purpose. When you want to "break out" of the select/poll/epoll you simply need to make the appropriate arrangements for this, very special, file descriptor to become available for reading, and this will make it happen.
After select/poll/epoll returns you'll check that file descriptor, just like you would check any other one, and proceed according to whatever should happen in that event. So the only remaining part of this question is what kind of a very special file descriptor would this be?
Well, since you tagged your question with linux, you have many Linux-specific options to choose from.
You can turn off native signal handling in your process, and create a signal file descriptor. Then a sent signal to the process translates to the signal file descriptor becoming available for reading, and reading from it, as documented in the manual page, tells you that the signal has been received.
An event file descriptor could be another option, this one's more suitable for different threads in the same process notifying each other.
Both event and signal file descriptors are eminently pollable/selectable. And there's always the old-school approach of creating a pipe(), selecting/polling the read end of the pipe, and writing to the write end of the pipe to effect the notification.
Is it normal, for a given file descriptor shared between a forked parent and child process, that the file position in the parent process remains the same after a child process reads from the same file descriptor?
This is happening for me. Here's the setup:
I am writing a C++ CGI program, so it reads http requests from stdin. When processing a multipart_form, I process stdin with an intermediary object (Multipart_Pull) that has a getc() method that detects the boundary strings and returns EOF at the end of a each field, so I can pretend a field's contents are a file. When the field is a file upload, I fork twice in order to pipe the results of Multipart_Pull::getc to the stdin of a child process that runs ssconvert to make a CSV file from an Excel file for further processing. I wrote the child process to leave the file pointer at the position where the parent could pick it up. The parent process uses wait() to ensure the child processes are done before continuing.
For testing while developing Multipart_Pull, I am faking stdin by opening a disk file that was copied from a real multipart_form request.
When faking stdin, and after the child process returns, the first character read in the parent process is the same first character that the child process read when it started. That is, the file pointer didn't move in the parent's copy of the file.
I have confirmed that the child process actually reads the data by running gdb and following the appropriate child process by using set follow-fork-mode child, and also confirmed the file position of the parent on return by comparing the characters read against the file from which the data is read.
When I am really reading from stdin, I don't expect that this will be a problem because (correct me if I'm wrong here), when you read a character from stdin, it's gone forever.
I realize that there are workarounds to solve this particular problem, the easiest being to just ignore any fields that follow a file upload on a multipart_form, i.e. the parent doesn't try to continue reading after the fork. However, I hate to cripple the production code or make unnecessary restrictions, and mainly because I really just want to understand what's happening.
Thanks in advance.
Is it normal, for a given file descriptor shared between a forked parent and child process, that the file position in the parent process remains the same after a child process reads from the same file descriptor?
Since you bring up fork(), I presume you are working with a POSIX-compliant system. Otherwise, the answer is subject to the specific details of your C++ implementation.
In POSIX terminology, file descriptors and streams are both types of "handles" on an underlying "open file description". There may be multiple distinct handles on the same open file description, potentially held by different processes. The fork() function is one way in which such a situation may arise.
In the event that multiple handles on the same open file description are manipulated, POSIX explicitly declares the results unspecified except under specific conditions. Your child processes satisfy their part of those requirements by closing their streams, either explicitly or as a consequence of normal process termination. According to POSIX, however, for the parent's subsequent use of its stream to have specified behavior, it "shall perform an lseek() or fseek() (as appropriate to the type of handle) to an appropriate location."
In other words, the parent process cannot rely on the child processes' manipulation of the file offset to automatically be visible to it, and in fact cannot rely on any particular offset at all after the children manipulate their copies of the stream.
Are standard input and standard output independent or not?
Consider a parent program had launched a child, and the parent's standard output was attached to the child's standard input, and the child's standard output was attached to the parent's standard input.
stdin <- stdout
parent child
stdout -> stdin
If the child (asynchronously) continually read from its standard input and wrote data to its standard output, but the parent just wrote to the child's standard input and didn't read from the child's standard output at all:
stdin| << stdout
parent child
stdout ==>==> stdin
would there eventually be a blockage? Do standard input and standard output share a buffer of any kind? Specifically via C++ std::cin (istream) and std::cout (ostream) if that's needed to answer. Does the standard require they do or do not share such a thing, or does it leave it up to the implementation?
What would happen?
You can't "attach" a file descriptor from a process to a file descriptor of a different process. What you do (if your operating system supports it) is to assign the two file descriptors to the ends of a "pipe". Pipes are not specified anywhere in the C/C++ standard (they are defined by POSIX), and you won't find any standard C/C++ library function which makes any reference to them at all.
As implemented by Unix (and Unix-like) systems, a pipe is little more than a buffer somewhere in the operating system. While the buffer is not full, a process can write data to the input end of the pipe; the data is simply added to the buffer. While the buffer is not empty, a process can read data from the output end of the buffer; the data is removed from the buffer and handed off to the reading process. If a process tries to write to a pipe whose buffer is full or read from a pipe whose buffer is empty, the process "blocks": that is, it is marked by the kernel scheduler as not runnable, and it stays in that state until the pipe can handle its request.
The scenario described in the question needs to involve two pipes. One pipe is used to allow the parent's stdout to send data to the child's stdin, and the other is used to allow the child's stdout to send data to the parent's stdin. These two pipes are wholly independent of each other.
Now, if the parent stops reading from its stdin, but the child continues writing to its stdout, then eventually the pipe buffer will become full. (It actually won't take very long. Pipe buffers are not very big, and they don't grow.) At that point, the child will block trying to write to the pipe. If the child is not multithreaded, then once it blocks, that's it. It stops running, so it won't read from its stdin any more. And if the child stops reading from its stdin, then the other pipe will soon become full and the parent will also block trying to write to its stdout.
So there's no requirement that resources be shared in order to achieve deadlock.
This is a very well-known bug in processes which spawn a child and try to feed data to the child while reading the child's response. If the reader does not keep up with the data produced, then deadlock is likely. You'll find lots of information about it by searching for, for example, "pipe buffer deadlock". Here are a few sample links, just at random:
Raymond Chen, on MSDN: http://blogs.msdn.com/b/oldnewthing/archive/2011/07/07/10183884.aspx
Right here on StackOverflow (with reference to Python but the issue is identical): Can someone explain pipe buffer deadlock?
David Glasser, from 2006: http://web.mit.edu/6.033/2006/wwwdocs/writing-samples/unix-DavidGlasser.html ("These limitations are not merely theoretical — they can be seen in practice by the fact that no major form of inter-process communication later developed in Unix is layered on top of pipe.")
I'm doing some research on the Linux kernel, particularly the input subsystem. I'm interested in reading /dev/input/eventX device(s) for different input events (mainly keyboard and mouse).
However the read() operation blocks. The only thing I can think of is creating a state of all the keyboard keys and mouse buttons, and then create a new thread for reading keyboard and mouse states (those threads might be blocked from time to time), and from my main process, access the state of the keyboard and mouse.
However, I'm not very experienced in non blocking programming under C++ and Linux and I think that a thread for each device might be an overkill.
I'd like to know if there are other ways to handle input in non blocking way, or using threads is fine?
Thanks, skwee.
You can check out the poll system call for this. Is for handling I/O on multiple file descriptors. One possibility would be to spawn only one thread to poll for events on multiple file descriptors.
Here is some reading material : http://www.makelinux.net/ldd3/chp-6-sect-3
You can set the file description to non blocking. You can also use select/poll to check to see if data is available to be read in which case you don't need non blocking. See this thread;
Non-blocking call for reading descriptor
I have a program that creates pipes between two processes. One process constantly monitors the output of the other and when specific output is encountered it gives input through the other pipe with the write() function. The problem I am having, though is that the contents of the pipe don't go through to the other process's stdin stream until I close() the pipe. I want this program to infinitely loop and react every time it encounters the output it is looking for. Is there any way to send the input to the other process without closing the pipe?
I have searched a bit and found that named pipes can be reopened after closing them, but I wanted to find out if there was another option since I have already written the code to use unnamed pipes and I haven't yet learned to use named pipes.
Take a look at using fflush.
How are you reading the other end? Are you expecting complete strings? You aren't sending terminating NULs in the snippet you posted. Perhaps sending strlen(string)+1 bytes will fix it. Without seeing the code it's hard to tell.
Use fsync. http://pubs.opengroup.org/onlinepubs/007908799/xsh/fsync.html
From http://www.delorie.com/gnu/docs/glibc/libc_239.html:
Once write returns, the data is enqueued to be written and can be read back right away, but it is not necessarily written out to permanent storage immediately. You can use fsync when you need to be sure your data has been permanently stored before continuing. (It is more efficient for the system to batch up consecutive writes and do them all at once when convenient. Normally they will always be written to disk within a minute or less.) Modern systems provide another function fdatasync which guarantees integrity only for the file data and is therefore faster. You can use the O_FSYNC open mode to make write always store the data to disk before returning.