Is output read from popen()ed FILE* complete before pclose()? - c++

pclose()'s man page says:
The pclose() function waits for the associated process to terminate and returns the exit status of the command as returned by wait4(2).
I feel like this means if the associated FILE* created by popen() was opened with type "r" in order to read the command's output, then you're not really sure the output has completed until after the call to pclose(). But after pclose(), the closed FILE* must surely be invalid, so how can you ever be certain you've read the entire output of command?
To illustrate my question by example, consider the following code:
// main.cpp
#include <iostream>
#include <cstdio>
#include <cerrno>
#include <cstring>
#include <sys/types.h>
#include <sys/wait.h>
int main( int argc, char* argv[] )
{
FILE* fp = popen( "someExecutableThatTakesALongTime", "r" );
if ( ! fp )
{
std::cout << "popen failed: " << errno << " " << strerror( errno )
<< std::endl;
return 1;
}
char buf[512] = { 0 };
fread( buf, sizeof buf, 1, fp );
std::cout << buf << std::endl;
// If we're only certain the output-producing process has terminated after the
// following pclose(), how do we know the content retrieved above with fread()
// is complete?
int r = pclose( fp );
// But if we wait until after the above pclose(), fp is invalid, so
// there's nowhere from which we could retrieve the command's output anymore,
// right?
std::cout << "exit status: " << WEXITSTATUS( r ) << std::endl;
return 0;
}
My questions, as inline above: if we're only certain the output-producing child process has terminated after the pclose(), how do we know the content retrieved with the fread() is complete? But if we wait until after the pclose(), fp is invalid, so there's nowhere from which we could retrieve the command's output anymore, right?
This feels like a chicken-and-egg problem, but I've seen code similar to the above all over, so I'm probably misunderstanding something. I'm grateful for an explanation on this.

TL;DR executive summary: how do we know the content retrieved with the fread() is complete? — we've got an EOF.
You get an EOF when the child process closes its end of the pipe. This can happen when it calls close explicitly or exits. Nothing can come out of your end of the pipe after that. After getting an EOF you don't know whether the process has terminated, but you do know for sure that it will never write anything to the pipe.
By calling pclose you close your end of the pipe and wait for termination of the child. When pclose returns, you know that the child has terminated.
If you call pclose without getting an EOF, and the child tries to write stuff to its end of the pipe, it will fail (in fact it wil get a SIGPIPE and probably die).
There is absolutely no room for any chicken-and-egg situation here.

Read the documentation for popen more carefully:
The pclose() function shall close a stream that was opened by popen(), wait for the command to terminate, and return the termination status of the process that was running the command language interpreter.
It blocks and waits.

I learned a couple things while researching this issue further, which I think answer my question:
Essentially: yes it is safe to fread from the FILE* returned by popen prior to pclose. Assuming the buffer given to fread is large enough, you will not "miss" output generated by the command given to popen.
Going back and carefully considering what fread does: it effectively blocks until (size * nmemb) bytes have been read or end-of-file (or error) is encountered.
Thanks to C - pipe without using popen, I understand better what popen does under the hood: it does a dup2 to redirect its stdout to the write-end of the pipe it uses. Importantly: it performs some form of exec to execute the specified command in the forked process, and after this child process terminates, its open file descriptors, including 1 (stdout) are closed. I.e. termination of the specified command is the condition by which the child process' stdout is closed.
Next, I went back and thought more carefully about what EOF really was in this context. At first, I was under the loosey-goosey and mistaken impression that "fread tries to read from a FILE* as fast as it can and returns/unblocks after the last byte is read". That's not quite true: as noted above: fread will read/block until its target number of bytes is read or EOF or error are encountered. The FILE* returned by popen comes from a fdopen of the read-end of the pipe used by popen, so its EOF occurs when the child process' stdout - which was dup2ed with the write-end of the pipe - is closed.
So, in the end what we have is: popen creating a pipe whose write end gets the output of a child process running the specified command, and whose read end if fdopened to a FILE* passed to fread. (Assuming fread's buffer is big enough), fread will block until EOF occurs, which corresponds to closure of the write end of popen's pipe resulting from termination of the executing command. I.e. because fread is blocking until EOF is encountered, and EOF occurs after command - running in popen's child process - terminates, it's safe to use fread (with a sufficiently large buffer) to capture the complete output of the command given to popen.
Grateful if anyone can verify my inferences and conclusions.

popen() is just a shortcut for series of fork, dup2, execv, fdopen, etc. It will give us access to child STDOUT, STDIN via files stream operation with ease.
After popen(), both the parent and the child process executed independently.
pclose() is not a 'kill' function, its just wait for the child process to terminate. Since it's a blocking function, the output data generated during pclose() executed could be lost.
To avoid this data lost, we will call pclose() only when we know the child process was already terminated: a fgets() call will return NULL or fread() return from blocking, the shared stream reach the end and EOF() will return true.
Here is an example of using popen() with fread(). This function return -1 if the executing process is failed, 0 if Ok. The child output data is return in szResult.
int exec_command( const char * szCmd, std::string & szResult ){
printf("Execute commande : [%s]\n", szCmd );
FILE * pFile = popen( szCmd, "r");
if(!pFile){
printf("Execute commande : [%s] FAILED !\n", szCmd );
return -1;
}
char buf[256];
//check if the output stream is ended.
while( !feof(pFile) ){
//try to read 255 bytes from the stream, this operation is BLOCKING ...
int nRead = fread(buf, 1, 255, pFile);
//there are something or nothing to read because the stream is closed or the program catch an error signal
if( nRead > 0 ){
buf[nRead] = '\0';
szResult += buf;
}
}
//the child process is already terminated. Clean it up or we have an other zoombie in the process table.
pclose(pFile);
printf("Exec command [%s] return : \n[%s]\n", szCmd, szResult.c_str() );
return 0;
}
Note that, all files operation on the return stream work on BLOCKING mode, the stream is open without O_NONBLOCK flags. The fread() can be blocked forever when the child process hang and nerver terminated, so use popen() only with trusted program.
To take more controls on child process and avoid the file blockings operation, we should use fork/vfork/execlv, etc. by ourself, modify the pipes opened attribut with O_NONBLOCK flags, use poll() or select() from time to time to determine if there are some data then use read() function to read from the pipe.
Use waitpid() with WNOHANG periodically to see if the child process was terminated.

Related

How to correctly use pipe to transfer data from child process to parent process?

I'm trying to create a function that returns true if execvp is successful and false if it is not. Initially, I didn't use a pipe and the problem was that whenever execvp failed, I get 2 returns, one false and one true (from the parent). Now that I'm piping, I'm never getting a false returned when execvp fails.
I know there are a lot related questions and answers on this topic, but I can't seem to narrow down where my particular error is. What I want is for my variables return_type_child, return_type_parent, and this->return_type to all contain the same value. I expected that in the child process, execvp would fail so the next lines would execute. As a result, I thought that the 3 variables mentioned would all be false, but instead when I print out the value in this->return_type, 1 is displayed.
bool Command::execute() {
this->fork_helper();
return return_type;
}
void Command::fork_helper() {
bool return_type_child = true;
int fd[2];
pipe(fd);
pid_t child;
char *const argv[] = {"zf","-la", nullptr};
child = fork();
if (child > 0) {
wait(NULL);
close(0);
close(fd[1]);
dup(fd[0]);
bool return_type_parent = read(fd[0], &return_type_child, sizeof(return_
this->return_type = return_type_parent;
}
else if (child == 0) {
close(fd[0]);
close(1);
dup(fd[1]);
execvp(argv[0], argv);
this->return_type = false;
return_type_child = false;
write(1,&return_type_child,sizeof(return_type_child));
}
return;
}
I've also tried putting a cout statement after execvp(argv[0], argv), which never ran. Any help is greatly appreciated!
From the code, it seems to be an XY problem (edit: moved this section to the front due to a comment that confirms this). If the goal is to get the exit status of the child, then for that there is the value that wait returns, and no pipes are required:
int stat;
wait(&stat);
Read the manual of wait to figure out how to read it. The value of stat can be tested as follows:
WEXITSTATUS(stat) - If WIFEXITED(stat) != 0, then this are the lower 8 bits of child's call to exit(N) or the return value from main. It might work correctly without checking WIFEXITED, but the standard does not specify that.
WTERMSIG(stat) - If WIFSIGNALED(stat) != 0, then this is the signal number that caused the process to exit (e.g. 11 is segmentation fault). It might work correctly without checking WIFSIGNALED, but the standard does not specify that.
There are several errors in the code. See the added comments:
void Command::fork_helper() {
// File descriptors here: 0=stdin, 1=stdout, 2=stderr
//(and 3..N opened in the program, could also be none).
bool return_type_child = true;
int fd[2];
pipe(fd);
// File descriptors here: 0=stdin, 1=stdout, 2=stderr
//(and 3..N opened in the program, could also be none).
// N+1=fd[0] data exhaust of the pipe
// N+2=fd[1] data intake of the pipe
pid_t child;
char *const argv[] = {"zf","-la", nullptr};
child = fork();
if (child > 0) {
// This code is executed in the parent.
wait(NULL); // wait for the child to complete.
This wait is a potential deadlock: if the child writes enough data to the pipe (usually in the kilobytes), the write blocks and waits for the parent to read the pipe. The parent wait(NULL) waits for the child to complete, which which waits for the parent to read the pipe. This is likely not effecting the code in question, but it is problematic.
close(0);
close(fd[1]);
dup(fd[0]);
// File descriptors here: 0=new stdin=data exhaust of the pipe
// 1=stdout, 2=stderr
// (and 3..N opened in the program, could also be none).
// N+1=fd[0] data exhaust of the pipe (stdin is now a duplicate)
This is problematic since:
the code just lost the original stdin.
The pipe is never closed. You should close fd[0] explicitly, don't close(0),
and don't duplicate fd[0].
It is good idea to avoid having duplicate descriptors, except for having stderr duplicate stdout.
.
bool return_type_parent = read(fd[0], &return_type_child, sizeof(return_
this->return_type = return_type_parent;
}
else if (child == 0) {
// this code runs in the child.
close(fd[0]);
close(1);
dup(fd[1]);
// File descriptors here: 0=stdin, 1=new stdout=pipe intake, 2=stderr
//(and 3..N opened in the program, could also be none).
// N+2=fd[1] pipe intake (new stdout is a duplicate)
This is problematic, since there are two duplicate data intakes to the pipe. In this case it is not critical since they are both closed automatically when the process ends, but it is a bad practice. It is a bad practice, since only closing all the pipe intakes signals END-OF-FILE to the exhaust. Closing one intake but not the other, does not signal END-OF-FILE. Again, in your case it is not causing trouble since the child's exit closes all the intakes.
execvp(argv[0], argv);
The code below the above line is never reached, unless execvp itself failed. The execvp fails only when the file does not exist, or the caller has no permission to execute it. If the executable starts to execute and fails later (possibly even if it fails to read a shared library), then still execvp itself succeeds and never returns. This is because execvp replaces the executable, and the following code is no longer in memory when execvp starts to run the other program.
this->return_type = false;
return_type_child = false;
write(1,&return_type_child,sizeof(return_type_child));
}
return;
}

Closing unused end in pipes

I was reading about pipes in my operating system course and writing some code to understand it better. I have a doubt regardign the following code:
int fd[2]; // CREATING PIPE
pipe(fd);
int status;
int pid=fork();
if(pid==0)
{
// WRITER PROCESS
srand(123);
int arr[3]={1,2,3};
close(fd[0]); // CLOSE UNUSED(READING END)
for(int i=0;i<3;i++)
write(fd[1],&arr[i],sizeof(int));
close(fd[1]); // CLOSE WRITING END AFTER WRITING SO AS READ GETS THE EOF
}
else
{
// READER PROCESS
int arr[10];
int i=0;
int n_bytes;
//close(fd[1]); // CLOSE UNUSED(WRITING END)
while((n_bytes=read(fd[0],&arr[i],sizeof(int)))>0) // READIN IN A LOOP UNTIL END
i++;
close(fd[0]); // CLOSE READING END after reading
for(int j=0;j<i;j++)
cout<<arr[j]<<endl;
while(wait(&status)>0)
;
}
If I run this, the read is getting blocked, if I uncomment the close(fd[1]) command in the reader process, the code runs fine.
That means close(fd[1]) closes the write end and read can proceed.
My doubt is even if i dont close the write end in reader process, it is getting closed at the end of the writer process. So why is still read sys call getting blocked?
Initially, both processes have open file descriptors to both the read and write ends of the pipe.
The OS will only close an end of the pipe when all open file descriptors to it have been closed, so if you don't call close(fd[1]) in the child process one file descriptor will remain open, and the write end of the pipe will not be closed, and read will block waiting for input that will never come.
Two problems:
The first is that due to operator precedence the loop condition n_bytes=read(fd[0],&arr[i],sizeof(int))>0 is really equalt n_bytes = (read(fd[0],&arr[i],sizeof(int)) > 0). That is, you assign the value of the comparison to the variable n_bytes. To correct this add extra parentheses around the assignment, as in (n_bytes=read(fd[0],&arr[i],sizeof(int)))>0.
The second problem is that both the parent and the child process will call wait in a loop. You should only do that in the parent process to wait for the child.

Unnamed pipe gets blocked although there is data to read

I'm having some problems with unnamed pipes or "fifos" in C. I have two executable files: One tries to read, the other one tries to write. The reader is meant to be executed only once. I tried to make a simple code to show my problem, so it reads 10 times and then it gets closed. However, the writer should be executed many times (in my original program, it can't be executed twice at once: You have to wait for it to finish to run it again).
The problem with this code is: it only prints the incoming message when another one arrives. It seems that it gets blocked until it receives another message. I don't know what is happening, but it seems the "read" line blocks the program although there is data to read, and it works again when I send new data.
I tried another thing: As you can see the writer closes the file descriptor. The reader opens the file descriptor twice, because it would find EOF and get unblocked if it didn't. I tried eliminating those lines (the writer wouldn't close the fd, the reader would open the fd just once, eliminating the second "open()"). But for some reason, it unblocks if I do that. Why does that happen?
This is my code:
Reader:
int main () {
int fd;
static const std::string FILE_FIFO = "/tmp/archivo_fifo";
mknod ( static_cast<const char*>(FILE_FIFO.c_str()),S_IFIFO|0666,0 );
std::string mess = "Hii!! Example";
//open:
fd = open ( static_cast<const char*>(FILE_FIFO.c_str()),O_WRONLY );
//write:
write ( fd, static_cast<const void*>(mess.c_str()) ,mess.length() );
std::cout << "[Writer] I wrote " << mess << std::endl;
//close:
close ( fd );
fd = -1;
std::cout << "[Writer] END" << std::endl;
exit ( 0 );
}
Writer:
int main () {
int i,fd;
static const int BUFFSIZE = 100;
static const std::string name = "/tmp/archivo_fifo";
mknod ( static_cast<const char*>(name.c_str()),S_IFIFO|0666,0 );
char buffer[BUFFSIZE];
i=0;
fd = open ( name.c_str(),O_RDONLY );
while (true) {
i++;
std::cout << "Waiting to read Fifo: "<< i << std::endl;
ssize_t bytesLeidos = read ( fd,static_cast<void*>(buffer),BUFFSIZE);
fd = open ( name.c_str(),O_RDONLY );
std::string mess = buffer;
mess.resize ( bytesLeidos );
std::cout << "[Reader] I read: " << mess << std::endl;
sleep(3);
if (i==10) break;
}
close ( fd );
fd = -1;
unlink ( name.c_str() );
std::cout << "[Reader] END" << std::endl;
exit ( 0 );
}
Thanks in advance. And please excuse my poor English
you should use the select call to find out if any data is available on the fd of the pipe.
have a look at
http://en.wikipedia.org/wiki/Select_(Unix)
You've opened file in blocking mode:
If some process has the pipe open for writing and O_NONBLOCK is clear, read() shall block the calling thread until some data is written or the pipe is closed by all processes that had the pipe open for writing.
Depends on you goals you shoud rather synchronize readers and writers of your pipe, or use non-blocking mode for reader. Read about poll, epoll, select.
I've been reading more about unnamed pipes and now I understand the problem. I wrote:
the reader opens the file descriptor twice, because it would find EOF and get unblocked if it didn't. I tried eliminating those lines (the writer wouldn't close the fd, the reader would open the fd just once, eliminating the second "open()"). But for some reason, it unblocks if I do that. Why does that happen?
It unblocks because the other process closes, so the OS closes the file descriptor anyway. That's why although I didn't wrote close(fd) it unblocks.
The only way in which a blocking fifo can unblock is:
1) there is data to read
2) the other program closed the file descriptor. If there is no data to read and the writer closed the file descriptor (even if the file descriptor is open in reader), read() returns 0 and unblocks.
So my solution was: redesign my program so it would have the writer's file descriptor open all the time. Which means: there is only a executable file now. I'm pretty sure I could have done it with two executables, but I would probably need semaphores or something like that to synchronize, so it wouldn't try to read if the writer's fd is closed.

Is it possible to read intermittently-sent data through a named pipe using redirection to stdin?

Is it possible to read intermittently-sent data through a named pipe using redirection to stdin?
What I'd like to do is this:
$ mkfifo pipe
$ ./test < pipe
In another terminal:
$ cat datafile > pipe
$ cat datafile > pipe
repeating dumping information into the pipe. This only works the first time.
Here's a demonstration program for test that shows the behavior:
int main(int argc, char *argv[]) {
char input_string[30];
while(1) {
while( cin.read(input_string, 30) || cin.gcount()!=0 ) {
cout << "!" << endl;
}
}
return 1;
}
So, what's going on? Does redirection only provide the contents of a single send to the pipe? I've already written a version of the actual production code that takes in the name of the pipe as a parameter and keeps it open for writing this way, and maybe that's the answer. But I'm wondering if there's a way to do this with redirection.
When you redirect the input from the pipe like this:
./test < pipe
The shell opens the pipe for reading and then starts your program. But opening the pipe does not complete until a writer exists -- that is, open(2) blocks. When another process opens the pipe for writing, the original open call completes, and the two can communicate. When the writer closes its end of the pipe, the read end also closes -- the reader gets an EOF.
Once that cycle completes, you can reopen the pipe for reading and start another cycle, but you have to do it yourself. So if you're reading for stdin, you'll have to restart your program. Alternatively, you can just reopen the pipe on a different file descriptor, e.g.:
// Error checking omitted for expository purposes
int main(int argc, char **argv)
{
while(1)
{
int fd = open("pipe", O_RDONLY);
char buffer[30];
int n;
while((n = read(fd, buffer, sizeof(buffer)) > 0)
{
// Process input
}
close(fd);
}
return 0;
}
If you want to wrap the raw I/O in a stdio FILE*, you can use fdopen(3); I'm not aware of a way to wrap a file descriptor in a C++ stream object, though it might be possible.
$ cat datafile > pipe
sends the content of datafile to the pipe, and an EOF (end of file). At this point the redirection is closed, and the data pushed to the pipe afterwards is not redirected to ./test anymore.

Using posix pipe() and dup() with C++ to redirect I/O problems

I have to modify a simple shell I wrote for a previous homework assignment to handle I/O redirection and I'm having trouble getting the pipes to work. It seems that when I write and read to stdout and from stdin after duplicating the file descriptors in the separates processes, the pipe works, but if I use anything like printf, fprintf, gets, fgets, etc to try and see if the output is showing up in the pipe, it goes to the console even though the file descriptor for stdin and stdout clearly is a copy of the pipe (I don't know if that's the correct way to phrase that, but the point is clear I think).
I am 99.9% sure that I am doing everything as it should be at least in plain C -- such as closing all the file descriptors appropriately after the dup() -- and file I/O works fine, so this seems like an issue of a detail that I am not aware of and cannot find any information on. I've spent most of the day trying different things and the past few hours googling trying to figure out if I could redirect cin and cout to the pipe to see if that would fix it, but it seems like it's more trouble than it's worth at this point.
Should this work just by redirecting stdin and stdout since cin and cout are supposed to be sync'd with stdio? I thought it should, especially since the commands are probably written in C so they would use stdio, I would think. However, if I try a command like "cat [file1] [file2] | sort", it prints the result of cat [file1] [file2] to the command line, and the sort doesn't get any input so it has no output. It's also clear that cout and cin are not affected by the dup() either, so I put two and two together and came to this conclusion
Here is a somewhat shortened version of my code minus all the error checking and things like that, which I am confident I am handling well. I can post the full code if it come to it, but it's a lot so I'll start with this.
I rewrote the function so that the parent forks off a child for each command and connects them with pipes as necessary and then waits for the child processes to die. Again, write and read on the file descriptors 0 and 1 work (i.e. write to and reads from the pipe), stdio on the FILE pointers stdin and stdout do not work (do not write to pipe).
Thanks a lot, this has been killing me...
UPDATE: I wasn't changing the string cmd for each of the different commands so it didn't appear to work because the pipe just went to the same command so the final output was the same... Sorry for the dumbness, but thanks because I found the problem with strace.
int call_execv( string cmd, vector<string> &argv, int argc,
vector<int> &redirect)
{
int result = 0, pid, /* some other declarations */;
bool file_in, file_out, pipe_in, pipe_out;
queue<int*> pipes; // never has more than 2 pipes
// parse, fork, exec, & loop if there's a pipe until no more pipes
do
{
/* some declarations for variables used in parsing */
file_in = file_out = pipe_in = pipe_out = false;
// parse the next command and set some flags
while( /* there's more redirection */ )
{
string symbol = /* next redirection symbol */
if( symbol == ">" )
{
/* set flags, get filename, etc */
}
else if( symbol == "<" )
{
/* set flags, get filename, etc */
}
else if( pipe_out = (symbol == "|") )
{
/* set flags, and... */
int tempPipes[2];
pipes.push( pipe(tempPipes) );
break;
}
}
/* ... set some more flags ... */
// fork child
pid = fork();
if( pid == 0 ) // child
{
/* if pipe_in and pipe_out set, there are two pipes in queue.
the old pipes read is dup'd to stdin, and the new pipes
write is dup'd to stdout, other two FD's are closed */
/* if only pipe_in or pipe_out, there is one pipe in queue.
the unused end is closed in whichever if statement evaluates */
/* if neither pipe_in or pipe_out is set, no pipe in queue */
// redirect stdout
if( pipe_out ){
// close newest pipes read end
close( pipes.back()[P_READ] );
// dup the newest pipes write end
dup2( pipes.back()[P_WRITE], STDOUT_FILENO );
// close newest pipes write end
close( pipes.back()[P_WRITE] );
}
else if( file_out )
freopen(outfile.c_str(), "w", stdout);
// redirect stdin
if( pipe_in ){
close( pipes.front()[P_WRITE] );
dup2( pipes.front()[P_READ], STDIN_FILENO );
close( pipes.front()[P_READ] );
}
else if ( file_in )
freopen(infile.c_str(), "r", stdin);
// create argument list and exec
char **arglist = make_arglist( argv, start, end );
execv( cmd.c_str(), arglist );
cout << "Execution failed." << endl;
exit(-1); // this only executes is execv fails
} // end child
/* close the newest pipes write end because child is writing to it.
the older pipes write end is closed already */
if( pipe_out )
close( pipes.back()[P_WRITE] );
// remove pipes that have been read from front of queue
if( init_count > 0 )
{
close( pipes.front()[P_READ] ); // close FD first
pipes.pop(); // pop from queue
}
} while ( pipe_out );
// wait for each child process to die
return result;
}
Whatever the problem, you are not checking any return values. How do you know if the pipe() or the dup2() command succeeded? Have you verified that stdout and stdin really point to the pipe right before execv? Does execv keep the filedescriptors you give it? Not sure, here is the corresponding paragraph from the execve documentation:
By default, file descriptors remain open across an execve(). File descriptors that are marked close-on-exec are closed; see the description of FD_CLOEXEC in fcntl(2). (If a
file descriptor is closed, this will cause the release of all record locks obtained on the underlying file by this process. See fcntl(2) for details.) POSIX.1-2001 says
that if file descriptors 0, 1, and 2 would otherwise be closed after a successful execve(), and the process would gain privilege because the set-user_ID or set-group_ID per‐
mission bit was set on the executed file, then the system may open an unspecified file for each of these file descriptors. As a general principle, no portable program,
whether privileged or not, can assume that these three file descriptors will remain closed across an execve().
You should add more debug output and see what really happens. Did you use strace -f (to follow children) on your program?
The following:
queue<int*> pipes; // never has more than 2 pipes
// ...
int tempPipes[2];
pipes.push( pipe(tempPipes) );
Is not supposed to work. Not sure how it compiles since the result of pipe() is int. Note only that, tempPipes goes out of scope and its contents get lost.
Should be something like that:
struct PipeFds
{
int fds[2];
};
std::queue<PipeFds> pipes;
PipeFds p;
pipe(p.fds); // check the return value
pipes.push(p);