ubuntu server pipeline stop process termination when the first exit - c++

The situation is: I have an external application so I don't have the source code and i can't change it. While running, the application writes logs to the stderr. The task is to write a program that check the output of it and separate some part of the output to other file. My solution is to start the app like
./externalApp 2>&1 | myApp
the myApp is a c++ app with the following source:
using namespace std;
int main ()
{
string str;
ofstream A;
A.open("A.log");
ofstream B;
B.open("B.log");
A << "test start" << endl;
int i = 0;
while (getline(cin,str))
{
if(str.find("asdasd") != string::npos)
{
A << str << endl;
}
else
{
B << str << endl;
}
++i;
}
A << "test end: " << i << " lines" << endl;
A.close();
B.close();
return 0;
}
The externalApp can crash or be terminated. A that moment the myApp gets terminated too and it is don't write the last lines and don't close the files. The file can be 60Gb or larger so saving it and processing it after not a variant.
Correction: My problem is that when the externalApp crash it terminate myApp. That mean any code after while block will never run. So the question is: Is there a way to run myApp even after the externalApp closed?
How can I do this task correctly? I interesed in any other idea to do this task.

There's nothing wrong with the shown code, and nothing in your question offers any evidence of anything being wrong with the shown code. No evidence was shown that your logging application actually received "the last lines" to be written from that external application. Most likely that external application simply failed to write them to standard output or error, before crashing.
The most likely explanation is that your external application checks if its standard output or error is connected to an interactive terminal; if so each line of its log message is followed by an explicit buffer flush. When the external application's standard output is a pipe, no such flushing takes place, so the log messages get buffered up, and are flushed only when the application's internal output buffer is full. This is a fairly common behavior. But because of that, when the external application crashes its last logged lines are lost forever. Because your logger never received them. Your logger can't do anything about log lines it never read.
In your situation, the only available option is to set up and connect a pseudo-tty device to the external application's standard output and error, making it think that's connected to an interactive terminal, while its output is actually captured by your application.
You can't do this from the shell. You need to write some code to set this up. You can start by reading the pty(7) manual page which explains the procedure to follow, at which point you will end up with file descriptors that you can take, and attach to your external application.

If you want your program to cleanly deal with the external program crashing you will probably need to handle SIGPIPE. The default behaviour of this signal is to terminate the process.

So the problem was not that when the first element of the pipe ended it terminate the second. The real problem was that the two app with pipes launched from bash script and when the bash script ended it terminated all of it child process. I solved it using
signal(SIGHUP,SIG_IGN);
that way my app executed to the end.
Thank you for all the answer at least I learned lot about the signals and pipes.

Related

std system is run after going to next line c++

here is piece of my code:
void Espresso::run()
{
std::system("/home/espresso-ab-1.0/src/espresso espresso.in > espresso.out");
std::string line;
std::ifstream myfile ("espresso.out");
if (myfile.is_open())
{
while ( getline (myfile,line) )
{
std::cout << line << '\n';
}
myfile.close();
}
}
I am wondering if above code first run the system command and fill completely "espresso.out" file and then go to the the next line of reading it.
if not, how I can make sure file is fully printed before going to read it.
NOTE: I am restricted to use C++03.
Thanks for your prompt answer. I want to Update my question by:
- Is it a thread safe method as well?
std::system is not an async function. So for example, if you'd run:
std::system("sleep 5");
std::cout << "Foo" << std::endl;
"Foo" will be displayed after 5 seconds.
Of course if you're on linux you could run it like this std::system("sleep 5 &"). Then the sleep command will run as a background process and the code following the system call will execute immediately.
Although I encourage you not to use this function. Calling system functions by their name is dangerous. Imagine what would happen, if someone replaced the sleep binary in your system with their own program. Conclusion: your program will hang until the system command is completed. So your file will be ready.
Yes, the command will be fully completed before the std::system call returns.

Why cerr output faster than cout?

Using cout needs a little bit more time to output the statement which isn't good for me. But when using cerr the output is faster. Why?
Just trying to help :
- cout -> Regular output (console output)
- cerr -> Error output (console error)
cout is buffered, cerr is not, so cout should be faster in most cases. (Although if you really care about speed, the C output functions such as printf tend to be a lot faster than cout/cerr).
cout and cerr are ostream objects. You can call rdbuf() on them to redirect their output independently wherever you want, from within the application. You can open a network socket, wrap it in a stream buffer and redirect there, if you want.
By default, cout is tied to the application's standard output. By default, the standard output is the screen. You can direct the OS to redirect stdout elsewhere. Or it might do it by itself - the nohup utility in Linux, for example, does. Services in Windows also have their standard streams redirected, I think.
And, cerr are tied to the application's standard error. By default the standard error is the screen. You can again redirect stderr elsewhere.
Another issue here is that clog, by default, is buffered like cout, whereas cerr is unit-buffered, meaning it automatically calls flush() after every complete output operation. This is very useful, since it means that the output is not lost in the buffer if the application crashes directly afterwards.
If you run a program like this:
yourprog > yourfile
What you write to cout will go to yourfile. What you write to cerr will go to your screen. That's usually a good thing. I probably don't want your error messages mixed in with your program output. (Especially if some of your error messages are just warnings or diagnostic stuff).
It's also possible to redirect cout to 1 file, and cerr to another. That's a handy paradigm: I run your program, redirect output to a file, error messages to a different file. If your program returns 0 from main, then I know it's OK to process the output file. If it returns an error code, I know NOT to process the output file. The error file will tell me what went wrong.
reference :
- http://www.tutorialspoint.com/cplusplus/cpp_basic_input_output.htm
- http://cboard.cprogramming.com/cplusplus-programming/91613-cout-cerr-clog.html

GDB/DDD: Debug shared library with multi-process application C/C++

I am trying to debug a server application but I am running into some difficulties breaking where I need to. The application is broken up into two parts:
A server application, which spawns worker processes (not threads) to handle incoming requests. The server basically spawns off processes which will process incoming requests first-come first-served.
The server also loads plugins in the form of shared libraries. The shared library defines most of the services the server is able to process, so most of the actual processing is done here.
As an added nugget of joy, the worker processes "respawn" (i.e. exit and a new worker process is spawned) so the PIDs of the children change periodically. -_-'
Basically I need to debug a service that's called within the shared library but I don't know which process to attach to ahead of time since they grab requests ad-hoc. Attaching to the main process and setting a breakpoint hasn't seemed to work so far.
Is there a way to debug this shared library code without having to attach to a process in advance? Basically I'd want to debug the first process that called the function in question.
For the time being I'll probably try limiting the number of worker processes to 1 with no respawn, but it'd be good to know how to handle a scenario like this in the future, especially if I'd like to make sure it still works in the "release" configuration.
I'm running on a Linux platform attempting to debug this with DDD and GDB.
Edit: To help illustrate what I'm trying to accomplish, let me provide a brief proof on concept.
#include <iostream>
#include <stdlib.h>
#include <unistd.h>
using namespace std;
int important_function( const int child_id )
{
cout << "IMPORTANT(" << child_id << ")" << endl;
}
void child_task( const int child_id )
{
const int delay = 10 - child_id;
cout << "Child " << child_id << " started. Waiting " << delay << " seconds..." << endl;
sleep(delay);
important_function(child_id);
exit(0);
}
int main( void )
{
const int children = 10;
for (int i = 0; i < 10; ++i)
{
pid_t pid = fork();
if (pid < 0) cout << "Fork " << i << "failed." << endl;
else if (pid == 0) child_task(i);
}
sleep(10);
return 0;
}
This program will fork off 10 processes which will all sleep 10 - id seconds before calling important_function, the function in which I want to debug in the first calling child process (which should, here, be the last one I fork).
Setting the follow-fork-mode to child will let me follow through to the first child forked, which is not what I'm looking for. I'm looking for the first child that calls the important function.
Setting detach-on-fork off doesn't help, because it halts the parent process until the child process forked exits before continuing to fork the other processes (one at a time, after the last has exited).
In the real scenario, it is also important that I be able to attach on to an already running server application who's already spawned threads, and halt on the first of those that call the function.
I'm not sure if any of this is possible since I've not seen much documentation on it. Basically I want to debug the first application to call this line of code, no matter what process it's coming from. (While it's only my application processes that'll call the code, it seems like my problem may be more general: attaching to the first process that calls the code, no matter what its origin).
You can set a breakpoint at fork(), and then issue "continue" commands until the main process's next step is to spawn the child process you want to debug. At that point, set a breakpoint at the function you want to debug, and then issue a "set follow-fork-mode child" command to gdb. When you continue, gdb should hook you into the child process at the function where the breakpoint is.
If you issue the command "set detach-on-fork off", gdb will continue debugging the child processes. The process that hits the breakpoint in the library should halt when it reaches that breakpoint. The problem is that when detach-on-fork is off, gdb halts all the child processes that are forked when they start. I don't know of a way to tell it to keep executing these processes after forking.
A solution to this I believe would be to write a gdb script to switch to each process and issue a continue command. The process that hits the function with the breakpoint should stop.
A colleague offered another solution to the problem of getting each child to continue. You can leave "detach-on-fork" on, insert a print statement in each child process's entry point that prints out its process id, and then give it a statement telling it to wait for the change in a variable, like so:
{
volatile int foo = 1;
printf("execute \"gdb -p %u\" in a new terminal\n", (unsigned)getpid());
printf("once GDB is loaded, give it the following commands:\n");
printf(" set variable foo = 0\n");
printf(" c\n");
while (foo == 1) __asm__ __volatile__ ("":::"memory");
}
Then, start up gdb, start the main process, and pipe the output to a file. With a bash script, you can read in the process IDs of the children, start up multiple instances of gdb, attach each instance to one of the different child processes, and signal each to continue by clearing the variable "foo".

Strange behavior with boost file_sink when forking

I'm observing some strange behavior when I use a file_sink (in boost::iostreams) and then fork() a child process.
The child continues the same codebase, i.e., no exec() call, because this is done as part of daemonizing the process. My full code fully daemonizaes the process, of course, but I have omitted those steps that are unncessary for reporducing the behavior.
The following code is a simplified example that demonstrates the behavior:
using namespace std;
namespace io = boost::iostreams;
void daemonize(std::ostream& log);
int main (int argc, char** argv)
{
io::stream_buffer<io::file_sink> logbuf;
std::ostream filelog(&logbuf);
//std::ofstream filelog;
// Step 1: open log
if (argc > 1)
{
//filelog.open(argv[1]);
logbuf.open(io::file_sink(argv[1]));
daemonize(filelog);
}
else
daemonize(std::cerr);
return EXIT_SUCCESS;
}
void daemonize(std::ostream& log)
{
log << "Log opened." << endl;
// Step 2: fork - parent stops, child continues
log.flush();
pid_t pid = fork(); // error checking omitted
if (pid > 0)
{
log << "Parent exiting." << endl;
exit(EXIT_SUCCESS);
}
assert(0 == pid); // child continues
// Step 3: write to log
sleep(1); // give parent process time to exit
log << "Hello World!" << endl;
}
If I run this with no argument (e.g., ./a.out), so that it logs to stderr, then I get the expected output:
Log opened.
Parent exiting.
Hello World!
However, if I do something like ./a.out temp; sleep 2; cat temp then I get:
Log opened.
Hello World!
So the parent is somehow no longer writing to the file after the fork. That's puzzle #1.
Now supposed I just move io::stream_buffer<io::file_sink> logbuf; outside of main so that it's a global variable. Doing that and simply running ./a.out gives the same expected output as in the previous case, but writing to a file (e.g., temp) now gives a new puzzling behavior:
Log opened.
Parent exiting.
Log opened.
Hello World!
The line that writes "Log opened." is before the fork() so I don't see why that should appear twice in the output. (I even put an explicit flush() immediate before the fork() to make sure that line of output wasn't simply buffered, and then the buffer got copied during the fork() and later both copies eventually flushed to the stream...) So that's puzzle #2.
Of course, if I comment out the whole fork() process (the entire section labeled as "Step 2") then it behaves as expected for both file and stderr output, and regardless of whether logbuf is global or local to main().
Also, if I switch filelog to be an ofstream instead of stream_buffer<file_sink> (see commented out lines in main()) then it also behaves as expected for both file and stderr output, and regardless of whether filelog/logbuf are global or local to main().
So it really seems that it's an interaction between file_sink and fork() producing these strange behaviors... If anyone has ideas on what may be causing these, I'd appreciate the help!
I think I got it figured out... creating this answer for posterity / anyone who stumbles on this questions looking for an answer.
I observed this behavior in boost 1.40, but when I tried it using boost 1.46 everything behaved in the expected manner in all cases, i.e.:
Log opened.
Parent exiting.
Hello World!
So my assumption right now is that this was actually a bug in boost that was fixed sometime between version 1.41-1.46. I didn't see anything in the release notes that made it real obvious to me that they found & fixed the bug, but it's possible the release notes discussed fixing some underlying cause of this bug and I wasn't able to make the conneciton between that underlying cause and this scenario.
In any case, the solution seems to be to install boost version >= 1.46

using exec to execute a system command in a new process

I am trying to spawn a process that executes a system command, while my own program still proceeds and two processes will run in parallel. I am working on linux.
I looked up online and sounds like I should use exec() family. But it doesn't work quite as what I expected. For example, in the following code, I only see "before" being printed, ,but not "done".
I am curious if I am issing anything?
#include <unistd.h>
#include <iostream>
using namespace std;
main()
{
cout << "before" << endl;
execl("/bin/ls", "/bin/ls", "-r", "-t", "-l", (char *) 0);
cout << "done" << endl;
}
[UPDATE]
Thank you for your guys comments. Now my program looks like this. Everything works fine except at the end, I have to press enter to finish the program. I am not sure why I have to press the last enter?
#include <unistd.h>
#include <iostream>
using namespace std;
main()
{
cout << "before" << endl;
int pid = fork();
cout << pid << endl;
if (pid==0) {
execl("/bin/ls", "ls", "-r", "-t", "-l", (char *) 0);
}
cout << "done" << endl;
}
You're missing a call to fork. All exec does is replace the current process image with that of the new program. Use fork to spawn a copy of your current process. Its return value will tell you whether it's the child or the original parent that's running. If it's the child, call exec.
Once you've made that change, it only appears that you need to press Enter for the programs to finish. What's actually happening is this: The parent process forks and executes the child process. Both processes run, and both processes print to stdout at the same time. Their output is garbled. The parent process has less to do than the child, so it terminates first. When it terminates, your shell, which was waiting for it, wakes and prints the usual prompt. Meanwhile, the child process is still running. It prints more file entries. Finally, it terminates. The shell isn't paying attention to the child process (its grandchild), so the shell has no reason to re-print the prompt. Look more carefully at the output you get, and you should be able to find your usual command prompt buried in the ls output above.
The cursor appears to be waiting for you to press a key. When you do, the shell prints a prompt, and all looks normal. But as far as the shell was concerned, all was already normal. You could have typed another command before. It would have looked a little strange, but the shell would have executed it normally because it only receives input from the keyboard, not from the child process printing additional characters to the screen.
If you use a program like top in a separate console window, you can watch and confirm that both programs have already finished running before you have to press Enter.
The Exec family of functions replaces the current process with the new executable.
To do what you need, use one of the fork() functions and have the child process exec the new image.
[response to update]
It is doing exactly what you told it: You don't have to press "enter" to finish the program: It has already exited. The shell has already given a prompt:
[wally#zenetfedora ~]$ ./z
before
22397
done
[wally#zenetfedora ~]$ 0 << here is the prompt (as well as the pid)
total 5102364
drwxr-xr-x. 2 wally wally 4096 2011-01-31 16:22 Templates
...
The output from ls takes awhile so it buries the prompt. If you want output to appear in a more logical order, add sleep(1) (or maybe longer) before the "done".
You're missing the part where execl() replaces your current program in memory with /bin/ls
I would suggest looking at popen() which will fork and exec a new process, then let you read or write to it via a pipe. (Or if you need read and write, fork() yourself, then exec())