I have a process which suddenly hanged and is not giving any core dump and is also not killed.i can see it still running using the ps command.
how can i know which statement it is currently executing inside the code.
basically i want to know where exactly it got hanged.
language is c++ and platform is solaris unix.
demos.283> cat test3.cc
#include<stdio.h>
#include<unistd.h>
int main()
{
sleep(100);
return 0;
}
demos.284> CC test3.cc
demos.285> ./a.out &
[1] 2231
demos.286> ps -o "pid,wchan,comm"
PID WCHAN COMMAND
23420 fffffe86e9a5aff6 -tcsh
2345 - ps
2231 ffffffffb8ca3376 ./a.out
demos.290> ps
PID TTY TIME CMD
3823 pts/36 0:00 ps
23420 pts/36 0:00 tcsh
3822 pts/36 0:00 a.out
demos.291> pstack 3822
3822: ./a.out
fed1a215 nanosleep (80478c0, 80478c8)
080508ff main (1, 8047920, 8047928, fed93ec0) + f
0805085d _start (1, 8047a4c, 0, 8047a54, 8047a67, 8047c05) + 7d
demos.292>
You have several options: the easiest is to check the WCHAN wait channel that the process is sleeping on:
$ ps -o "pid,wchan,comm"
PID WCHAN COMMAND
2350 wait bash
20639 hrtime i3status
20640 poll_s dzen2
28821 - ps
This can give you a good indication of what the process is doing and is very easy to get.
You can use ktruss and ktrace or DTrace to trace your process. (Sorry, no Solaris here, so no examples.)
You can also attach gdb(1) to your process:
# gdb -p 20640
GNU gdb (Ubuntu/Linaro 7.2-1ubuntu11) 7.2
...
(gdb) bt
#0 0x00007fd1a99fd123 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:82
#1 0x0000000000405533 in ?? ()
#2 0x00007fd1a993deff in __libc_start_main (main=0x4043e3, argc=13, ubp_av=0x7fff25e7b478,
...
The backtrace is often the single most useful error report you can get from a process, so it is worth installing gdb(1) if it isn't already installed. gdb(1) can do a lot more than just show you backtraces, but a full tutorial is well outside the scope of Stack Overflow.
you can try with pstack passing pid as parameter. You can use ps to get the process id (pid)
For example: pstack 1267
Related
Ubuntu 16.04.4 LTS, GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
I am trying to call a function in a compiled C program and get the following:
"(gdb) call getVarName(someParam)
You can't do that without a process to debug."
There are no other codes or messages.
I can run the program from the shell prompt
jef#ubuntu$ ./program.
I can run the program within gdb after designating the file. Permissions are 777 (just to cover all bases).
Based on research, I set the SHELL with "export SHELL=/bin/bash"
and
set kernal.yama.ptrace_scope = 0 in /etc/sysctl.d/10-ptrace.conf
I still get the same behavior.
I still get the same behavior.
Naturally.
The error you are getting means: you can't do this, unless you are debugging a live process.
This will work:
(gdb) break main
(gdb) run
... GDB is now stopped, *and* you have a live process.
... you *can* call getVarName(...) now
(gdb) call getVarName(...)
(gdb) continue # causes the process to run to end and exit
[Inferior 1 (process 195969) exited normally]
(gdb) # Now you no longer have a live process, so you *again* can't
# call functions in it.
I needed to debug a program asynchronously, because it stalled, and Ctrl+C killed gdb, rather than interrupting the program (this is on MinGW/MSYS).
Someone hinted that gdb wouldn't work on Windows in async mode, and indeed it didn't (with the Asynchronous execution not supported on this target. message), but that gdbserver would.
So I try:
$ gdbserver localhost:60000 ./a_.exe 0
Process ./a_.exe created; pid = 53644
Listening on port 60000
(Supplying the 0 as the argument, according to how the manpage says it's done.)
Then in another terminal:
$ gdb ./a_.exe
(gdb) target remote localhost:60000
Remote debugging using localhost:60000
0x76fa878f in ntdll!DbgBreakPoint () from C:\Windows\system32\ntdll.dll
(gdb) continue
Continuing.
[Inferior 1 (Remote target) exited with code 01]
While the original now looks like:
$ gdbserver localhost:60000 ./a_.exe 0
Process ./a_.exe created; pid = 53484
Listening on port 60000
Remote debugging from host 127.0.0.1
Expecting 1 argument: test case number to run.
Child exited with status 1
GDBserver exiting
That is, my program thought that it got no arguments.
Is the manpage wrong?
Yes! "Misleading" is a more fitting term. (Misleading, at least as it applies to this version of gdbserver on this platform.)
The first argument is literally the first argument (argv) given to the inferior. Normally this is the name of the executable. So, the following worked:
$ gdbserver localhost:60000 ./a_.exe whatever 0
That is, the manpage should have said, to be consistent:
target> gdbserver host:2345 emacs emacs foo.txt
I am not able to help you out with gdbserver,
but if you are just looking for a way to interrupt a program running in mingw/msys gdb(Similar to Ctrl+C on linux)
take a look at at Debugbreak
Is it possible to run a process with gdb, modify some memory and then detach from the process afterwards?
I can't start the process from outside of gdb as I need to modify the memory, before the first instruction is executed.
When you detach from a process started with gdb, gdb will hang, but killing gdb from another process makes the debugged process still running.
I currently use the following script to launch the process:
echo '# custom gdb function that finds the entry_point an assigns it to $entry_point_address
entry_point
b *$entry_point_address
run
set *((char *)0x100004147) = 0xEB
set *((char *)0x100004148) = 0xE2
detach # gdb hangs here
quit # quit never gets executed
' | gdb -quiet "$file"
This happens in both of my gdb versions:
GNU gdb 6.3.50-20050815 (Apple version gdb-1824)
GNU gdb 6.3.50-20050815 (Apple version gdb-1822 + reverse.put.as patches v0.4)
I'm pretty sure that you can't detach from an inferior processes that was started directly under gdb, however, something like the following might work for you, this is based on a recent gdb, I don't know how much of this will work on version 6.3.
Create a small shell script, like this:
#! /bin/sh
echo $$
sleep 10
exec /path/to/your/program arg1 arg2 arg3
Now start this up, spot the pid from echo $$, and attach to the shell script like this gdb -p PID. Once attached you can:
(gdb) set follow-fork-mode child
(gdb) catch exec
(gdb) continue
Continuing.
[New process NEW-PID]
process NEW-PID is executing new program: /path/to/your/program
[Switching to process NEW-PID]
Catchpoint 1 (exec'd /path/to/your/program), 0x00007f40d8e9fc80 in _start ()
(gdb)
You can now modify the child process as required. Once you're finished just do:
(gdb) detach
And /path/to/your/program should resume (or start in this case) running.
I seem to have some kind of multithreading bug in my code that makes it crash once every 30 runs of its test suite. The test suite is non-interactive. I want to run my test suite in gdb, and have gdb exit normally if the program exits normally, or break (and show a debugging prompt) if it crashes. This way I can let the test suite run repeatedly, go grab a cup of coffee, come back, and be presented with a nice debugging prompt. How can I do this with gdb?
This is a little hacky but you could do:
gdb -ex='set confirm on' -ex=run -ex=quit --args ./a.out
If a.out terminates normally, it will just drop you out of GDB. But if you crash, the program will still be active, so GDB will typically prompt if you really want to quit with an active inferior:
Program received signal SIGABRT, Aborted.
0x00007ffff72dad05 in raise (sig=...) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
in ../nptl/sysdeps/unix/sysv/linux/raise.c
A debugging session is active.
Inferior 1 [process 15126] will be killed.
Quit anyway? (y or n)
Like I said, not pretty, but it works, as long as you haven't toggled off the prompt to quit with an active process. There is probably a way to use gdb's quit command too: it takes a numeric argument which is the exit code for the debugging session. So maybe you can use --eval-command="quit stuff", where stuff is some GDB expression that reflects whether the inferior is running or not.
This program can be used to test it out:
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main() {
if (time(NULL) % 2) {
raise(SIGINT);
}
puts("no crash");
return EXIT_SUCCESS;
}
You can also trigger a backtrace when the program crashes and let gdb exit with the return code of the child process:
gdb -return-child-result -ex run -ex "thread apply all bt" -ex "quit" --args myProgram -myProgramArg
The easiest way is to use the Python API offered by gdb:
def exit_handler(event):
gdb.execute("quit")
gdb.events.exited.connect(exit_handler)
You can even do it with one line:
(gdb) python gdb.events.exited.connect(lambda x : gdb.execute("quit"))
You can also examine the return code to ensure it's the "normal" code you expected with event.exit_code.
You can use it in conjuction with --eval-command or --command as mentioned by #acm to register the event handler from the command line, or with a .gdbinit file.
Create a file named .gdbinit and it will be used when gdb is launched.
run
quit
Run with no options:
gdb --args prog arg1...
You are telling gdb to run and quit, but it should stop processing the file if an error occurs.
Make it dump core when it crashes. If you're on linux, read the man core man page and also the ulimit builtin if you're running bash.
This way when it crashes you'll find a nice corefile that you can feed to gdb:
$ ulimit -c unlimited
$ ... run program ..., gopher coffee (or reddit ;)
$ gdb progname corefile
If you put the following lines in your ~/.gdbinit file, gdb will exit when your program exits with a status code of 0.
python
def exit_handler ( event ):
if event .exit_code == 0:
gdb .execute ( "quit" )
gdb .events .exited .connect ( exit_handler )
end
The above is a refinement of Kevin's answer.
Are you not getting a core file when it crashes? Start gdb like this 'gdb -c core' and do a stack traceback.
More likely you will want to be using Valgrind.
I am doing degug for MPI C++ on Linux with GDB.
I cannot use the following command:
xterm -e gdb mpirun -np 1 ./myApplication
to open a window for the executable program ./myApplication: the xterm terminal appears and then disappears immediately.
Why does this happen?
I can open an xterm with:
xterm or xterm -e gdb.
Any help is really appreciated.
#chatan almost got it right.
If you want to invoke gdb on a program while passing arguments to that program, you need to use gdb's --args option. For example (I don't have mpirun, so I'll use /bin/sleep):
$ gdb --args /bin/echo hello
[...]
Reading symbols from /bin/echo...(no debugging symbols found)...done.
(gdb) run
Starting program: /bin/echo hello
hello
Program exited normally.
gdb doesn't automatically start running the program; it waits for input.
Without the --args option, gdb takes -np as a gdb option, not as an argument to mpirun. Since gdb doesn't have a -np option, it terminates with an error message:
$ gdb mpirun -np 1 ./myApplication
gdb: unrecognized option '-np'
Use `gdb --help' for a complete list of options.
And when you run xterm -e gdb mpirun -np 1 ./myApplication, xterm runs, it invokes gdb, gdb terminates with an error message, and xterm terminates before you get a chance to see the message.
So this should do the trick:
xterm -e gdb --args mpirun -np 1 ./myApplication
Of course you'll still have to type the run command within gdb to invoke mpirun. (If you're using gdb, you probably already know that.)
For future reference, if you have problems running a program under xterm -e, try running it by itself.
Your command is not going to work the way you expect it to anyway. gdb will ignore the arguments after 'mpirun'. And a naked mpirun command, without any arguments, is going to immediately exit (just try running mpirun by hand in a terminal). Since your xterm was started to execute that one command, it disappears after that process is finished.
What you need to do is, open an xterm. Then run "gdb mpirun" command.
You should end up in gdb command prompt. At this prompt, you need to issue the following command:
(gdb) run -np 1 ./myApplication
Now your application should be running inside gdb.