Use gdb to debug MPI in multiple screen windows - gdb

If I have an MPI program that I want to debug with gdb while being able to see all of the separate processes' outputs, I can use:
mpirun -n <NP> xterm -hold -e gdb -ex run --args ./program [arg1] [arg2] [...]
which is well and good when I have a GUI to play with. But that is not always the case.
Is there a similar set up I can use with screen such that each process gets its own window? This would be useful for debugging in a remote environment since it would allow me to flip between outputs using Ctrl+a n.

I think this answer in the "How do I debug an MPI program?" thread does what you want.
EDITS:
In response to the comment, you can do it somewhat more easily, although succinct isnt exactly the term I would use:
Launch a detached screen via mpirun - running your debugger and process. I've called the session mpi, and im passing through my library path because it gets stripped by screen and my demo needs it (also I'm on a mac, hence lldb and DYLD):
mpirun -np 4 screen -AdmS mpi env DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH lldb demo.out
Then launch a seperate screen session, which i've called 'debug':
screen -AdmS debug
Use screen -ls to list the running sessions:
>> screen -ls
There are screens on:
19871.mpi (Detached)
19872.mpi (Detached)
19875.mpi (Detached)
19876.mpi (Detached)
20105.debug (Detached)
Now launch 4 new tabs in the debug session, attaching each to one of the mpi sessions:
screen -S debug -X screen -t tab0 screen -r 19871.mpi
screen -S debug -X screen -t tab1 screen -r 19872.mpi
screen -S debug -X screen -t tab2 screen -r 19875.mpi
screen -S debug -X screen -t tab3 screen -r 19876.mpi
Then simply attach to your debug session with screen -r debug. Now you have 4 tabs, each running a serial instance of the debugger attached to an mpi process similarly to the xterm method you described before. Its not exactly the quickest set of commands, but at least you dont need to modify your code or chase PIDs etc.
Another method I tried, but doesnt seem to work:
Launch a detached screen
screen -AdmS ashell
Launch two mpi processes that start new screen tabs in the detached session, launching lldb with my demo mpi application:
mpirun -np 1 screen -S ashell -X screen -t tab1 env DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH lldb demo.out : -np 1 screen -S ashell -X screen -t tab2 env DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH lldb demo.out
Or alternatively just
mpirun -np 2 screen -S ashell -X screen env DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH lldb demo.out
Then attach to screen with
screen -r ashell
And you'll have 3 tabs, 2 of them running lldb with your program, and one with whatever your standard shell is. Unfortunately when you try running the programs, each process thinks its the only one in the comm world, and im not sure what to do about that...

How do you debug a C/C++ MPI program?
One way is to start a separate terminal and gdb session for each of the
processes:
mpirun -n <NP> xterm -hold -e gdb -ex run --args ./program [arg1] [arg2] [...]
where NP is the number of processes.
What if you don't have a GUI handy?
(See below for a handy script.)
This is based on timofiend's answer here.
Spin up the mpi program in its debugger in a number of screen sessions:
mpirun -np 4 screen -AdmS mpi gdb ./parallel_pit_fill.exe one retain ./beauford.tif 500 500
Spin up a new screen session to access the debugger:
screen -AdmS debug
Load the debugger's screen sessions in to the new screen session
screen -list | #Get list of screen sessions
grep -E "[0-9]+.mpi" | #Extract the relevant ones
awk '{print NR-1,$1}' | #Generate tab #s and session ids, drop rest of the string
xargs -n 2 sh -c '
screen -S debug -X screen -t tab$0 screen -r $1
'
Jump into the new screen session:
screen -r debug
I've encapsulated the above in a handy script:
#!/bin/bash
if [ $# -lt 2 ]
then
echo "Parallel Debugger Syntax: $0 <NP> <PROGRAM> [arg1] [arg2] [...]"
exit 1
fi
the_time=`date +%s` #Use this so we can run multiple debugging sessions at once
#(assumes we are only starting one per second)
#The first argument is the number of processes. Everything else is what we want
#to run. Make a new mpi screen for each process.
mpirun -np $1 screen -AdmS ${the_time}.mpi gdb "${#:2}"
#Create a new screen for debugging from
screen -AdmS ${the_time}.debug
#The following are used for loading the debuggers into the debugging screen
firstpart="screen -S ${the_time}.debug"
secondpart=' -X screen -t tab$0 screen -r $1'
screen -list | #Get list of mpi screens
grep -E "[0-9]+.${the_time}.mpi" | #Extract the relevant ones
awk '{print NR-1,$1}' | #Generate tab #s and session ids, drop rest of the string
xargs -n 2 sh -c "$firstpart$secondpart"
screen -r ${the_time}.debug #Enter debugging screen

You can have a look at tmpi, which automates what the other answers show how to achieve, but using tmux instead of screen.
And as a bonus, it multiplexes your keyboard input to all MPI ranks!

Related

how to popen xterm on osx from c++ application?

I do this:
popen("xterm -e ' some bash script ' ","r");
and it works fine if I launch my application from a terminal command line.
but if I double click in the finder to launch it (i.e. non terminal), the application runs, but the xterm doesn't appear.
(xterm is maybe not the right solution on osx, what I want to do is to open a terminal from popen, interact with the user inside the terminal, and return the result of this interaction to the main program)
The output shown in xterm's window (or likely other terminals) will not be read by popen, so that part is unclear. However, you say that works from a terminal window.
Another problem is that the DISPLAY variable needed to run xterm may not be set in the environment where the finder is running. You can work around that by adding a suitable -display option to the command-line. For instance, if your application is running and displaying on the local machine (likely), you could try
popen("xterm -display :0.0 -e ' some bash script ' ","r");
When capturing output from xterm, there are two types of output to consider:
error messages from xterm itself are written to the standard error
the program running inside xterm, e.g., 'some bash script', will write to the xterm window.
For your example, you could capture the error messages in the pipe (from popen directly by redirecting the standard error in the command to xterm's standard output, e.g.,
popen("xterm -display :0.0 -e ' some bash script ' 2>&1 ","r");
Capturing the output of the bash script is harder. You could redirect the output of the bash script itself, e.g.,
popen("xterm -display :0.0 -e ' some bash script >mylogfile ' ","r");
but that interferes with interaction. A better solution might be to use the script program, doing something like this:
popen("xterm -display :0.0 -e script mylogfile ' some bash script ' ","r");

Process gets killed after xterm terminates

I want to run xterm terminal in C++ to create a Linux process like this
system("xterm -e adb start-server")
The adb process is created but after that command it gets killed. I was trying to solve this problem by using nohup and screen but nothing works. I know that I have to put the adb process into background, but how to do that with xterm?
Edit:
I'm loking for solution that will terminate/close the xterm window, but not the adb process. Later I want to use multiple commands in the same xterm window like
system("xterm -e \"adb start-server; adb connect 192.168.X.XXX;\"");
and all output (and eventually errors) I want to see in the same xterm.
You can do it like this:
xterm -e /bin/bash -c "adb start-server; /bin/bash"

Have Gnu Screen Pass SIGTERM Signal to Child Processes, Allowing Them To Shut Down Cleanly

We are using Upstart to launch/terminate an in-house developed binary.
In the Upstart configuration file for this binary, we define the script as such:
script
exec su - user -c "screen -D -m -S $product /opt/bin/prog /opt/cfg/$product -v 5 --max_log_size=7"
end script
When the runlevel is set to 5, Upstart launches the script. When the runlevel is set to 3, Upstart terminates the script.
My problem, is Upstart is sending a SIGTERM and then a SIGKILL.
The SIGTERM is being 'handled' by screen, and not by my custom binary, so the signal handlers in our binary dont get the SIGTERM, and thus, cannot shut down cleanly.
I've verified that the signal handlers in our binary do allow it to shut down cleanly when it is NOT launched via screen.
Turns out I had to approach this from a different perspective, and handle it via Upstart. The addition of a pre-stop script, allowed me to identify the Screen session, and then stuff in the commands ("quit\n" and then "y\n") to cleanly shut down the binary that Screen was running.
pre-stop script
SESSID=`ps -elf | grep '/opt/bin/prog /opt/cfg/$product' | grep SCREEN | awk '{print $4}'`
QUIT_CMD="screen -S $SESSID.$product -X stuff \"exit"$'\n'"\""
exec `su spuser -c "$QUIT_CMD"`
QUIT_CMD="screen -S $SESSID.$product -X stuff \"y"$'\n'"\""
exec `su spuser -c "$QUIT_CMD"`
sleep 20
end script

xterm window cannot be held on Linux, it appears and then disappears very fast

I am doing degug for MPI C++ on Linux with GDB.
I cannot use the following command:
xterm -e gdb mpirun -np 1 ./myApplication
to open a window for the executable program ./myApplication: the xterm terminal appears and then disappears immediately.
Why does this happen?
I can open an xterm with:
xterm or xterm -e gdb.
Any help is really appreciated.
#chatan almost got it right.
If you want to invoke gdb on a program while passing arguments to that program, you need to use gdb's --args option. For example (I don't have mpirun, so I'll use /bin/sleep):
$ gdb --args /bin/echo hello
[...]
Reading symbols from /bin/echo...(no debugging symbols found)...done.
(gdb) run
Starting program: /bin/echo hello
hello
Program exited normally.
gdb doesn't automatically start running the program; it waits for input.
Without the --args option, gdb takes -np as a gdb option, not as an argument to mpirun. Since gdb doesn't have a -np option, it terminates with an error message:
$ gdb mpirun -np 1 ./myApplication
gdb: unrecognized option '-np'
Use `gdb --help' for a complete list of options.
And when you run xterm -e gdb mpirun -np 1 ./myApplication, xterm runs, it invokes gdb, gdb terminates with an error message, and xterm terminates before you get a chance to see the message.
So this should do the trick:
xterm -e gdb --args mpirun -np 1 ./myApplication
Of course you'll still have to type the run command within gdb to invoke mpirun. (If you're using gdb, you probably already know that.)
For future reference, if you have problems running a program under xterm -e, try running it by itself.
Your command is not going to work the way you expect it to anyway. gdb will ignore the arguments after 'mpirun'. And a naked mpirun command, without any arguments, is going to immediately exit (just try running mpirun by hand in a terminal). Since your xterm was started to execute that one command, it disappears after that process is finished.
What you need to do is, open an xterm. Then run "gdb mpirun" command.
You should end up in gdb command prompt. At this prompt, you need to issue the following command:
(gdb) run -np 1 ./myApplication
Now your application should be running inside gdb.

Use GDB to debug a C++ program called from a shell script

I have a extremely complicated shell script, within which it calls a C++ program I want to debug via GDB. It is extremely hard to separate this c++ program from the shell since it has a lot of branches and a lot of environmental variables setting.
Is there a way to invoke GDB on this shell script? Looks like gdb requires me to call on a C++ program directly.
In addition to options mentioned by #diverscuba23, you could do the following:
gdb --args bash <script>
(assuming it's a bash script. Else adapt accordingly)
There are two options that you can do:
Invoke GDB directly within the shell script. This would imply that you don't have standard in and standard out redirected.
Run the shell script and then attach the debugger to the already running C++ process like so: gdb progname 1234 where 1234 is the process ID of the running C++ process.
If you need to do things before the program starts running then option 1 would be the better choice, otherwise option 2 is the cleaner way.
Modify the c++ application to print its pid and sleep 30 seconds (perhaps based on environment or an argument). Attach to the running instance with gdb.
I would probably modify the script to always call gdb (and revert this later) or add an option to call gdb. This will almost always be the easiest solution.
The next easiest would be to temporarily move your executable and replace it with a shell script that runs gdb on the moved program. For example, in the directory containing your program:
$ mv program _program
$ (echo "#!/bin/sh"; echo "exec gdb $PWD/_program") > program
$ chmod +x program
Could you just temporarily add gdb to your script?
Although the answers given are valid, sometimes you don't have permissions to change the script to execute gdb or to modify the program to add additional output to attach through pid.
Luckily, there is yet another way through the power of bash
Use ps, grep and awk to pick-out the pid for you after its been executed. You can do this by either wrapping the other script with your own or by just executing a command yourself.
That command might look something like this:
process.sh
#!/usr/bin/env bash
#setup for this example
#this will execute vim (with cmdline options) as a child to bash
#we will attempt to attach to this process
vim ~/.vimrc
To get gdb to attach, we'd just need to execute the following:
gdb --pid $(ps -ef | grep -ve grep | grep vim | awk '{print $2}')
I use ps -ef here to list the processes and their arguments. Sometimes, you'll have multiple instances of a program running and need to further grep down to the one you want
the grep -ve grep is there because the f option to ps will include the next grep in its list. If you don't need the command arguments for additional filtering, don't include the -f option for ps and ignore this piece
grep vim is where we're finding our desired process. If you needed more filtering, you could just do something like grep -E "vim.*vimrc" and filter down to exactly the process that you're trying to attach to
awk '{print $2}' simply outputs just the process' pid to stdout. Use $1 if you're using ps -e instead of ps -ef
My normal setup is to run such script that starts my process in 1 tmux pane and having typed something similar to the above in a bottom pane. That way if I need to adjust the filtering (for whatever reason), I can do it pretty quickly.
Usually though, it will be the same for a specific instance and I want to just attach automatically after its been started. I'll do the following instead:
runGdb.py
#!/usr/bin/env bash
./process.sh &
PID=$(ps -ef | grep -ve grep | grep -E "vim.*vimrc" | awk '{print $2}')
#or
#PID=$(ps -e | grep vim | awk '{print $1}')
gdb --pid $PID
This assumes that the original process can be safely run in the background.