In macOS, I find that SIGABRT won't generate core dumps in some cases.
For example, I run a sleep in one terminal:
lianxin.wlx#mbp [01:08:21] [~/test]
-> % sleep 1000
And send a SIGABRT to it in another terminal:
lianxin.wlx#mbp [01:08:59] [~]
-> % ps -ef | grep sleep
502 47679 20388 0 1:08AM ttys001 0:00.01 sleep 1000
lianxin.wlx#mbp [01:09:03] [~]
-> % kill -6 47679
Then the sleep process is aborted, but no core dump is generated.
lianxin.wlx#mbp [01:08:21] [~/test]
-> % sleep 1000
[1] 47679 abort sleep 1000
lianxin.wlx#mbp [01:10:35] [~/test]
-> % ls /cores
lianxin.wlx#mbp [01:10:37] [~/test]
-> %
So why? I've tested the same operations in Linux, it did generate a core dump.
I'm sure I've opened the core dump right(ulimit -c unlimited, and /cores's privilege is 777). I wrote a program that will crash with SIGSEGV, and it did generate a core dump in /cores.
If you make a simple program,
main() {
abort();
}
It will generate a core dump if run with appropriate priv.
Also, if you make a:
main() {
sleep(100);
}
run it in the background and kill -ABRT , it will generate a core dump.
But /bin/sleep doesn't, which is a bit odd.
This is assuming you have followed the recipe in man core.
Related
How do I force a process to core dump on RHEL 6?
I tried kill -3 , but the process is still running.
kill -SIGSEGV kills the process, but no core is generated :
terminate called after throwing an instance of 'omni_thread_fatal'
EVServices: ./../../../rw/db/dbref.h:251: T *RWDBCountedRef<T>::operator->() const [with T = RWDBHandleImp]: Assertion `(impl_) != 0' failed.
/evaluate/ev_dev87/shl/StartProcess.sh[69]: wait: 35225: Killed
Thu Dec 5 11:14:03 EST 2013 Exited EVServices, pid=35225, with ERROR returncode=265 signal=SIGKILL
Please tell me what else I can try to force a process to core.
Use SIGABRT to generate a core dump: kill -6 <pid>
This requires the running process to be allowed to write core dumps, issue ulimit -c unlimited in the same shell as the one used to run your program, before running that program.
I'm wondering if it's possible to launch an application via GDB, on a SegFault write the backtrace to a file (to look at later), and then exit GDB all without any user input.
I'm running an application from a shell script in an infinite loop (so if it crashes it reloads) on OS boot from a non-interactive session. The application is crashing in a non-reproducible way so I need a backtrace from the crash to debug the issue. Ideally, I'd just modify the shell script to include the GDB debugging + backtracing functionality and preserve the automatic restarting of the application following a crash.
Is this possible to do?
Thanks to Aditya Kumar; acceptable solution:
gdb -batch -ex "run" -ex "bt" ${my_program} 2>&1 | grep -v ^"No stack."$
If the program needs arguments:
gdb -batch -ex "run" -ex "bt" --args ${my_program} param1 param2 \
param3 ... 2>&1 | grep -v ^"No stack."$
gdb --batch -q <debuged_executable> <core_file> -ex bt
This works with gdb 7.6:
My test program that causes a core dump if it is given a command line parameter:
int a(int argc)
{
if (argc > 1) {
int *p = 0;
*p = *p +1;
return *p;
}
else {
return 0;
}
}
int b(int argc)
{
return a(argc);
}
int main(int argc, char *argv[])
{
int res = b(argc);
return res;
}
My python script my_check.py:
def my_signal_handler (event):
if (isinstance(event, gdb.SignalEvent)):
log_file_name = "a.out.crash." + str(gdb.selected_inferior().pid) + ".log"
gdb.execute("set logging file " + log_file_name )
gdb.execute("set logging on")
gdb.execute("set logging redirect on")
gdb.execute("thread apply all bt")
gdb.execute("q")
gdb.events.stop.connect(my_signal_handler)
gdb.execute("set confirm off")
gdb.execute("set pagination off")
gdb.execute("r")
gdb.execute("q")
So, first I run a.out and there is no crash. No log files are created:
gdb -q -x my_check.py --args ./a.out >/dev/null
Next I run a.out and give it one parameter:
>gdb -q -x my_check.py --args ./a.out 1 >/dev/null
And this is a crash report:
>cat a.out.crash.13554.log
Thread 1 (process 13554):
#0 0x0000000000400555 in a (argc=2) at main.cpp:5
#1 0x000000000040058a in b (argc=2) at main.cpp:15
#2 0x00000000004005a3 in main (argc=2, argv=0x7fffffffe198) at main.cpp:20
Alternatively to just storing the backtrace, you could put ulimit -c unlimited in front of your infinite loop in your shell script. The result will be that every time your program segfaults, it will write a core dump into a file which on my system is just called core but on other systems might include the process id. If the program segfaults (you see this from its exit status being equal to 139) then just move the core file to a safe location using a unique name (for example using timestamps). With these core files and gdb you can then do even more than just look at the backtrace. Thus I guess using them might even be more useful to you.
I seem to have some kind of multithreading bug in my code that makes it crash once every 30 runs of its test suite. The test suite is non-interactive. I want to run my test suite in gdb, and have gdb exit normally if the program exits normally, or break (and show a debugging prompt) if it crashes. This way I can let the test suite run repeatedly, go grab a cup of coffee, come back, and be presented with a nice debugging prompt. How can I do this with gdb?
This is a little hacky but you could do:
gdb -ex='set confirm on' -ex=run -ex=quit --args ./a.out
If a.out terminates normally, it will just drop you out of GDB. But if you crash, the program will still be active, so GDB will typically prompt if you really want to quit with an active inferior:
Program received signal SIGABRT, Aborted.
0x00007ffff72dad05 in raise (sig=...) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
in ../nptl/sysdeps/unix/sysv/linux/raise.c
A debugging session is active.
Inferior 1 [process 15126] will be killed.
Quit anyway? (y or n)
Like I said, not pretty, but it works, as long as you haven't toggled off the prompt to quit with an active process. There is probably a way to use gdb's quit command too: it takes a numeric argument which is the exit code for the debugging session. So maybe you can use --eval-command="quit stuff", where stuff is some GDB expression that reflects whether the inferior is running or not.
This program can be used to test it out:
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main() {
if (time(NULL) % 2) {
raise(SIGINT);
}
puts("no crash");
return EXIT_SUCCESS;
}
You can also trigger a backtrace when the program crashes and let gdb exit with the return code of the child process:
gdb -return-child-result -ex run -ex "thread apply all bt" -ex "quit" --args myProgram -myProgramArg
The easiest way is to use the Python API offered by gdb:
def exit_handler(event):
gdb.execute("quit")
gdb.events.exited.connect(exit_handler)
You can even do it with one line:
(gdb) python gdb.events.exited.connect(lambda x : gdb.execute("quit"))
You can also examine the return code to ensure it's the "normal" code you expected with event.exit_code.
You can use it in conjuction with --eval-command or --command as mentioned by #acm to register the event handler from the command line, or with a .gdbinit file.
Create a file named .gdbinit and it will be used when gdb is launched.
run
quit
Run with no options:
gdb --args prog arg1...
You are telling gdb to run and quit, but it should stop processing the file if an error occurs.
Make it dump core when it crashes. If you're on linux, read the man core man page and also the ulimit builtin if you're running bash.
This way when it crashes you'll find a nice corefile that you can feed to gdb:
$ ulimit -c unlimited
$ ... run program ..., gopher coffee (or reddit ;)
$ gdb progname corefile
If you put the following lines in your ~/.gdbinit file, gdb will exit when your program exits with a status code of 0.
python
def exit_handler ( event ):
if event .exit_code == 0:
gdb .execute ( "quit" )
gdb .events .exited .connect ( exit_handler )
end
The above is a refinement of Kevin's answer.
Are you not getting a core file when it crashes? Start gdb like this 'gdb -c core' and do a stack traceback.
More likely you will want to be using Valgrind.
I have a process which suddenly hanged and is not giving any core dump and is also not killed.i can see it still running using the ps command.
how can i know which statement it is currently executing inside the code.
basically i want to know where exactly it got hanged.
language is c++ and platform is solaris unix.
demos.283> cat test3.cc
#include<stdio.h>
#include<unistd.h>
int main()
{
sleep(100);
return 0;
}
demos.284> CC test3.cc
demos.285> ./a.out &
[1] 2231
demos.286> ps -o "pid,wchan,comm"
PID WCHAN COMMAND
23420 fffffe86e9a5aff6 -tcsh
2345 - ps
2231 ffffffffb8ca3376 ./a.out
demos.290> ps
PID TTY TIME CMD
3823 pts/36 0:00 ps
23420 pts/36 0:00 tcsh
3822 pts/36 0:00 a.out
demos.291> pstack 3822
3822: ./a.out
fed1a215 nanosleep (80478c0, 80478c8)
080508ff main (1, 8047920, 8047928, fed93ec0) + f
0805085d _start (1, 8047a4c, 0, 8047a54, 8047a67, 8047c05) + 7d
demos.292>
You have several options: the easiest is to check the WCHAN wait channel that the process is sleeping on:
$ ps -o "pid,wchan,comm"
PID WCHAN COMMAND
2350 wait bash
20639 hrtime i3status
20640 poll_s dzen2
28821 - ps
This can give you a good indication of what the process is doing and is very easy to get.
You can use ktruss and ktrace or DTrace to trace your process. (Sorry, no Solaris here, so no examples.)
You can also attach gdb(1) to your process:
# gdb -p 20640
GNU gdb (Ubuntu/Linaro 7.2-1ubuntu11) 7.2
...
(gdb) bt
#0 0x00007fd1a99fd123 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:82
#1 0x0000000000405533 in ?? ()
#2 0x00007fd1a993deff in __libc_start_main (main=0x4043e3, argc=13, ubp_av=0x7fff25e7b478,
...
The backtrace is often the single most useful error report you can get from a process, so it is worth installing gdb(1) if it isn't already installed. gdb(1) can do a lot more than just show you backtraces, but a full tutorial is well outside the scope of Stack Overflow.
you can try with pstack passing pid as parameter. You can use ps to get the process id (pid)
For example: pstack 1267
Is there a way to configure the directory where core dump files are placed for a specific process?
I have a daemon process written in C++ for which I would like to configure the core dump directory. Optionally the filename pattern should be configurable, too.
I know about /proc/sys/kernel/core_pattern, however this would change the pattern and directory structure globally.
Apache has the directive CoreDumpDirectory - so it seems to be possible.
No, you cannot set it per process. The core file gets dumped either to the current working directory of the process, or the directory set in /proc/sys/kernel/core_pattern if the pattern includes a directory.
CoreDumpDirectory in apache is a hack, apache registers signal handlers for all signals that cause a core dump , and changes the current directory in its signal handler.
/* handle all varieties of core dumping signals */
static void sig_coredump(int sig)
{
apr_filepath_set(ap_coredump_dir, pconf);
apr_signal(sig, SIG_DFL);
#if AP_ENABLE_EXCEPTION_HOOK
run_fatal_exception_hook(sig);
#endif
/* linuxthreads issue calling getpid() here:
* This comparison won't match if the crashing thread is
* some module's thread that runs in the parent process.
* The fallout, which is limited to linuxthreads:
* The special log message won't be written when such a
* thread in the parent causes the parent to crash.
*/
if (getpid() == parent_pid) {
ap_log_error(APLOG_MARK, APLOG_NOTICE,
0, ap_server_conf,
"seg fault or similar nasty error detected "
"in the parent process");
/* XXX we can probably add some rudimentary cleanup code here,
* like getting rid of the pid file. If any additional bad stuff
* happens, we are protected from recursive errors taking down the
* system since this function is no longer the signal handler GLA
*/
}
kill(getpid(), sig);
/* At this point we've got sig blocked, because we're still inside
* the signal handler. When we leave the signal handler it will
* be unblocked, and we'll take the signal... and coredump or whatever
* is appropriate for this particular Unix. In addition the parent
* will see the real signal we received -- whereas if we called
* abort() here, the parent would only see SIGABRT.
*/
}
It is possible to make it using the "|command" mechanism of the core_pattern file. The executed command can create the directories and files as needed. The command can be passed the following specifiers in the parameters (cf. man 5 core):
%% a single % character
%c core file size soft resource limit of crashing process
%d dump mode—same as value returned by prctl(2) PR_GET_DUMPABLE
%e executable filename (without path prefix)
%E pathname of executable, with slashes ('/') replaced by exclamation marks ('!')
%g (numeric) real GID of dumped process
%h hostname (same as nodename returned by uname(2))
%i TID of thread that triggered core dump, as seen in the PID namespace in which the thread resides
%I TID of thread that triggered core dump, as seen in the initial PID namespace
%p PID of dumped process, as seen in the PID namespace in which the process resides
%P PID of dumped process, as seen in the initial PID namespace
%s number of signal causing dump
%t time of dump, expressed as seconds since the Epoch, 1970-01-01 00:00:00 +0000 (UTC)
%u (numeric) real UID of dumped process
For example, it possible to create a script (e.g. named crash.sh) as follow:
#!/bin/bash
# $1: process number on host side (%P)
# $2: program's name (%e)
OUTDIR=/tmp/core/$2
OUTFILE="core_$1"
# Create a sub-directory in /tmp
mkdir -p "$OUTDIR"
# Redirect stdin in a per-process file:
cat > "$OUTDIR"/"$OUTFILE"
exit 0
In the shell:
$ chmod +x crash.sh
$ mv crash.sh /tmp # Put the script in some place
$ sudo su
# echo '|/tmp/crash.sh %P %e' > /proc/sys/kernel/core_pattern
# cat /proc/sys/kernel/core_pattern
|/tmp/crash.sh %P %e
# exit
$
Create an example program which crashes (e.g. fail.c):
int main(void)
{
char *ptr = (char *)0;
*ptr = 'q';
return 0;
}
Compile the program (make several executables) and adjust the core file size in the current shell:
$ gcc fail.c -o fail1
$ gcc fail.c -o fail2
$ ulimit -c
0
$ ulimit -c unlimited
$ ulimit -c
unlimited
Run the failing programs several times to have multiple processes ids:
$ ./fail1
Segmentation fault (core dumped)
$ ./fail2
Segmentation fault (core dumped)
$ ./fail1
Segmentation fault (core dumped)
$ ./fail2
Segmentation fault (core dumped)
Look at /tmp where the core_pattern redirect the core dumps:
$ ls -l /tmp/core
total 8
drwxrwxrwx 2 root root 4096 nov. 3 15:57 fail1
drwxrwxrwx 2 root root 4096 nov. 3 15:57 fail2
$ ls -l /tmp/core/fail1/
total 480
-rw-rw-rw- 1 root root 245760 nov. 3 15:57 core_10606
-rw-rw-rw- 1 root root 245760 nov. 3 15:57 core_10614
$ ls -l /tmp/core/fail2
total 480
-rw-rw-rw- 1 root root 245760 nov. 3 15:57 core_10610
-rw-rw-rw- 1 root root 245760 nov. 3 15:57 core_10618