Strange behavior with boost file_sink when forking - c++

I'm observing some strange behavior when I use a file_sink (in boost::iostreams) and then fork() a child process.
The child continues the same codebase, i.e., no exec() call, because this is done as part of daemonizing the process. My full code fully daemonizaes the process, of course, but I have omitted those steps that are unncessary for reporducing the behavior.
The following code is a simplified example that demonstrates the behavior:
using namespace std;
namespace io = boost::iostreams;
void daemonize(std::ostream& log);
int main (int argc, char** argv)
{
io::stream_buffer<io::file_sink> logbuf;
std::ostream filelog(&logbuf);
//std::ofstream filelog;
// Step 1: open log
if (argc > 1)
{
//filelog.open(argv[1]);
logbuf.open(io::file_sink(argv[1]));
daemonize(filelog);
}
else
daemonize(std::cerr);
return EXIT_SUCCESS;
}
void daemonize(std::ostream& log)
{
log << "Log opened." << endl;
// Step 2: fork - parent stops, child continues
log.flush();
pid_t pid = fork(); // error checking omitted
if (pid > 0)
{
log << "Parent exiting." << endl;
exit(EXIT_SUCCESS);
}
assert(0 == pid); // child continues
// Step 3: write to log
sleep(1); // give parent process time to exit
log << "Hello World!" << endl;
}
If I run this with no argument (e.g., ./a.out), so that it logs to stderr, then I get the expected output:
Log opened.
Parent exiting.
Hello World!
However, if I do something like ./a.out temp; sleep 2; cat temp then I get:
Log opened.
Hello World!
So the parent is somehow no longer writing to the file after the fork. That's puzzle #1.
Now supposed I just move io::stream_buffer<io::file_sink> logbuf; outside of main so that it's a global variable. Doing that and simply running ./a.out gives the same expected output as in the previous case, but writing to a file (e.g., temp) now gives a new puzzling behavior:
Log opened.
Parent exiting.
Log opened.
Hello World!
The line that writes "Log opened." is before the fork() so I don't see why that should appear twice in the output. (I even put an explicit flush() immediate before the fork() to make sure that line of output wasn't simply buffered, and then the buffer got copied during the fork() and later both copies eventually flushed to the stream...) So that's puzzle #2.
Of course, if I comment out the whole fork() process (the entire section labeled as "Step 2") then it behaves as expected for both file and stderr output, and regardless of whether logbuf is global or local to main().
Also, if I switch filelog to be an ofstream instead of stream_buffer<file_sink> (see commented out lines in main()) then it also behaves as expected for both file and stderr output, and regardless of whether filelog/logbuf are global or local to main().
So it really seems that it's an interaction between file_sink and fork() producing these strange behaviors... If anyone has ideas on what may be causing these, I'd appreciate the help!

I think I got it figured out... creating this answer for posterity / anyone who stumbles on this questions looking for an answer.
I observed this behavior in boost 1.40, but when I tried it using boost 1.46 everything behaved in the expected manner in all cases, i.e.:
Log opened.
Parent exiting.
Hello World!
So my assumption right now is that this was actually a bug in boost that was fixed sometime between version 1.41-1.46. I didn't see anything in the release notes that made it real obvious to me that they found & fixed the bug, but it's possible the release notes discussed fixing some underlying cause of this bug and I wasn't able to make the conneciton between that underlying cause and this scenario.
In any case, the solution seems to be to install boost version >= 1.46

Related

Program doesn't stop after returning 0 from main

I wrote a program that encodes files with Huffman coding. It works fine but for some reason after returning 0 from main function it doesn't stop.
Main function looks like:
int main(int argc, char *argv[])
{
if (argc < 5)
{
std::cout << "..." << std::endl;
}
else
{
if (!std::strcmp(argv[1], "haff") && !std::strcmp(argv[2], "-c"))
HuffmanC(argv[3], argv[4]);
if (!std::strcmp(argv[1], "haff") && !std::strcmp(argv[2], "-d"))
HuffmanD(argv[3], argv[4]);
std::cout << "Operations: " << count << std::endl;
}
return 0;
}
When I run it, I get:
MacBook-Pro-Alex:code alex$ ./main haff -c test.txt test.haffOperations: 37371553
It ends with an empty line and terminal says that the program keeps running, but the last cout statement executes well and as I get it it should return 0 and finish. How can I make it finish after returning 0? Or is the problem in the rest of the code?
Or is the problem in the rest of the code?
Possibly. Perhaps you've corrupted your stack somehow, so that you're "returning" from main to someplace you didn't expect. We can't really know without an complete, verifiable example.
How can I make it finish after returning 0?
You can use the kill command on MacOS to terminate it forcefully. Using the GUI task manager may or may not work.
But perhaps a more effective course of action would be to attach a debugger to the process and see what it's actually doing.
You could read this explanation on how to do this on MacOS with XCode - but I don't use MacOS, so I wouldn't know. Also, #SergeyA graciously suggests trying using pstack to get the process' current stack. Of course, if the stack has been garbled, there's no telling what you'll actually get.
Make sure the application is compiled with debugging information included.
Finally - it's probably better to run the program with a debugger attached in the first place, and set a breakpoint at the last cout << line.

ubuntu server pipeline stop process termination when the first exit

The situation is: I have an external application so I don't have the source code and i can't change it. While running, the application writes logs to the stderr. The task is to write a program that check the output of it and separate some part of the output to other file. My solution is to start the app like
./externalApp 2>&1 | myApp
the myApp is a c++ app with the following source:
using namespace std;
int main ()
{
string str;
ofstream A;
A.open("A.log");
ofstream B;
B.open("B.log");
A << "test start" << endl;
int i = 0;
while (getline(cin,str))
{
if(str.find("asdasd") != string::npos)
{
A << str << endl;
}
else
{
B << str << endl;
}
++i;
}
A << "test end: " << i << " lines" << endl;
A.close();
B.close();
return 0;
}
The externalApp can crash or be terminated. A that moment the myApp gets terminated too and it is don't write the last lines and don't close the files. The file can be 60Gb or larger so saving it and processing it after not a variant.
Correction: My problem is that when the externalApp crash it terminate myApp. That mean any code after while block will never run. So the question is: Is there a way to run myApp even after the externalApp closed?
How can I do this task correctly? I interesed in any other idea to do this task.
There's nothing wrong with the shown code, and nothing in your question offers any evidence of anything being wrong with the shown code. No evidence was shown that your logging application actually received "the last lines" to be written from that external application. Most likely that external application simply failed to write them to standard output or error, before crashing.
The most likely explanation is that your external application checks if its standard output or error is connected to an interactive terminal; if so each line of its log message is followed by an explicit buffer flush. When the external application's standard output is a pipe, no such flushing takes place, so the log messages get buffered up, and are flushed only when the application's internal output buffer is full. This is a fairly common behavior. But because of that, when the external application crashes its last logged lines are lost forever. Because your logger never received them. Your logger can't do anything about log lines it never read.
In your situation, the only available option is to set up and connect a pseudo-tty device to the external application's standard output and error, making it think that's connected to an interactive terminal, while its output is actually captured by your application.
You can't do this from the shell. You need to write some code to set this up. You can start by reading the pty(7) manual page which explains the procedure to follow, at which point you will end up with file descriptors that you can take, and attach to your external application.
If you want your program to cleanly deal with the external program crashing you will probably need to handle SIGPIPE. The default behaviour of this signal is to terminate the process.
So the problem was not that when the first element of the pipe ended it terminate the second. The real problem was that the two app with pipes launched from bash script and when the bash script ended it terminated all of it child process. I solved it using
signal(SIGHUP,SIG_IGN);
that way my app executed to the end.
Thank you for all the answer at least I learned lot about the signals and pipes.

std system is run after going to next line c++

here is piece of my code:
void Espresso::run()
{
std::system("/home/espresso-ab-1.0/src/espresso espresso.in > espresso.out");
std::string line;
std::ifstream myfile ("espresso.out");
if (myfile.is_open())
{
while ( getline (myfile,line) )
{
std::cout << line << '\n';
}
myfile.close();
}
}
I am wondering if above code first run the system command and fill completely "espresso.out" file and then go to the the next line of reading it.
if not, how I can make sure file is fully printed before going to read it.
NOTE: I am restricted to use C++03.
Thanks for your prompt answer. I want to Update my question by:
- Is it a thread safe method as well?
std::system is not an async function. So for example, if you'd run:
std::system("sleep 5");
std::cout << "Foo" << std::endl;
"Foo" will be displayed after 5 seconds.
Of course if you're on linux you could run it like this std::system("sleep 5 &"). Then the sleep command will run as a background process and the code following the system call will execute immediately.
Although I encourage you not to use this function. Calling system functions by their name is dangerous. Imagine what would happen, if someone replaced the sleep binary in your system with their own program. Conclusion: your program will hang until the system command is completed. So your file will be ready.
Yes, the command will be fully completed before the std::system call returns.

using exec to execute a system command in a new process

I am trying to spawn a process that executes a system command, while my own program still proceeds and two processes will run in parallel. I am working on linux.
I looked up online and sounds like I should use exec() family. But it doesn't work quite as what I expected. For example, in the following code, I only see "before" being printed, ,but not "done".
I am curious if I am issing anything?
#include <unistd.h>
#include <iostream>
using namespace std;
main()
{
cout << "before" << endl;
execl("/bin/ls", "/bin/ls", "-r", "-t", "-l", (char *) 0);
cout << "done" << endl;
}
[UPDATE]
Thank you for your guys comments. Now my program looks like this. Everything works fine except at the end, I have to press enter to finish the program. I am not sure why I have to press the last enter?
#include <unistd.h>
#include <iostream>
using namespace std;
main()
{
cout << "before" << endl;
int pid = fork();
cout << pid << endl;
if (pid==0) {
execl("/bin/ls", "ls", "-r", "-t", "-l", (char *) 0);
}
cout << "done" << endl;
}
You're missing a call to fork. All exec does is replace the current process image with that of the new program. Use fork to spawn a copy of your current process. Its return value will tell you whether it's the child or the original parent that's running. If it's the child, call exec.
Once you've made that change, it only appears that you need to press Enter for the programs to finish. What's actually happening is this: The parent process forks and executes the child process. Both processes run, and both processes print to stdout at the same time. Their output is garbled. The parent process has less to do than the child, so it terminates first. When it terminates, your shell, which was waiting for it, wakes and prints the usual prompt. Meanwhile, the child process is still running. It prints more file entries. Finally, it terminates. The shell isn't paying attention to the child process (its grandchild), so the shell has no reason to re-print the prompt. Look more carefully at the output you get, and you should be able to find your usual command prompt buried in the ls output above.
The cursor appears to be waiting for you to press a key. When you do, the shell prints a prompt, and all looks normal. But as far as the shell was concerned, all was already normal. You could have typed another command before. It would have looked a little strange, but the shell would have executed it normally because it only receives input from the keyboard, not from the child process printing additional characters to the screen.
If you use a program like top in a separate console window, you can watch and confirm that both programs have already finished running before you have to press Enter.
The Exec family of functions replaces the current process with the new executable.
To do what you need, use one of the fork() functions and have the child process exec the new image.
[response to update]
It is doing exactly what you told it: You don't have to press "enter" to finish the program: It has already exited. The shell has already given a prompt:
[wally#zenetfedora ~]$ ./z
before
22397
done
[wally#zenetfedora ~]$ 0 << here is the prompt (as well as the pid)
total 5102364
drwxr-xr-x. 2 wally wally 4096 2011-01-31 16:22 Templates
...
The output from ls takes awhile so it buries the prompt. If you want output to appear in a more logical order, add sleep(1) (or maybe longer) before the "done".
You're missing the part where execl() replaces your current program in memory with /bin/ls
I would suggest looking at popen() which will fork and exec a new process, then let you read or write to it via a pipe. (Or if you need read and write, fork() yourself, then exec())

File I/O across processes in Linux?

So, I've got a Linux process where I'm trying to manage some files on a tape. I have the following code that attempts to extract a file, catalog.xml, from the current archive on the tape and copy it to a fixed location (eventually, I'll parse the file and do some work with the results). But, my code is intermittently failing to work correctly. The tar command always succeeds, (and if I check the file system, I see the catalog.xml file), but sometimes my followup check to see if the file exists returns false. Am I doing something obviously wrong? It seems like I'm probably encountering a race condition between the fork()ed process returning and the results of that process being visible on the file system - is there some call I need to make?
pid_t tPid = vfork();
if (0 == tPid)
{
int tChildRet;
tChildRet = execlp("tar", "tar", "-xvf", "/dev/nst0", "-C", "/tmp", "catalog.xml", (char *) 0);
_exit(-1 == tChildRet ? EXIT_FAILURE : EXIT_SUCCESS);
}
else
{
wait(&ret);
}
std::ifstream tCatalogFile("/tmp/catalog.xml");
if (tCatalogFile)
{
cout << "File exists!" << endl;
} else {
cout << "File does not exist!" << endl;
}
And I'm getting either "file exists!" or "file does not exist!", seemingly at random.
Other notes:
On the failure cases:
if I do a stat ("/tmp/catalog.xml"), I get a return of -1 with errno set to ENOENT.
the tar command (run with the -v flag) produces the expected one line of output ("catalog.xml")
/tmp is a local tmpfs filesystem; the tape drive is a local device.
I'm using 2.6.30.9 Linux kernel with g++ 4.1.2 on a x86_64 box.
Thanks in advance!
Try calling sync after the wait call in the parent.
If that doesn't work, you may need to loop and/or sleep until the parent can open the file, since you know it's there.
If execlp succeeds, it will never get to the line where you call _exit. You aren't checking the return value (ret) from wait. It's not clear why you should be using vfork. Etc.
Since the parent is not doing anything else besides waiting for the child to complete, why not make your life easier and just use system()?