I am writing a plugin for a application, occasionally a SIGSEGV would be throw out. However, the application catches the signal SIGSEGV. In other word, The plugin is a dynamical library. The error occurs in my plugin and dynamical library. But the applcation handle the sSIGSEGV and exit normally. So, it is quite difficult for me to debug and get the backtrace of all stack frames. Any idea?
Currently I am using gdb as debug tool.
GDB will catch SIGSEGV before the application does.
What you described in comment to Logan's answer makes no sense.
I suspect what's really happening is that the application creates a new process, and only gets SIGSEGV in that other process, not the one you attached GDB to.
The following commands may be useful if my guess is correct:
(gdb) catch fork
(gdb) catch vfork
(gdb) set follow-fork-mode child
You might also want to edit and expand your question:
how do you know there is a SIGSEGV to begin with?
Posting a log of your interaction with GDB may also prove useful.
Even if the program traps SIGSEGV, gdb should still get it first and give you an opportunity to debug the program. Have you done something like
handle SIGSEGV nostop
in GDB? If so that could be why it is not stopping.
Are you sure that a segfault is actually occurring? Can you duplicate this behavior with another program, or by intentionally causing a segmentation violation?
For example:
$ cat sig.c
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
void handle(int n)
{
puts("Bail");
exit(1);
}
int main()
{
signal(SIGSEGV, handle);
int *pi = 0;
*pi = 10;
return 0;
}
$ gcc -g sig.c
$ ./a.out
Bail
$ gdb ./a.out
GNU gdb 6.6-debian
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) run
Starting program: /home/elcapaldo/a.out
Program received signal SIGSEGV, Segmentation fault.
0x08048421 in main () at sig.c:15
15 *pi = 10;
(gdb) where
#0 0x08048421 in main () at sig.c:15
(gdb) c
Continuing.
Bail
Program exited with code 01.
(gdb) q
Related
I have a C-and-C++-based project I just got to build and link for the first time, and it segfaults on execution. I tried running it in gdb to get a backtrace, and saw this:
gdb) run
Starting program: /home/jon/controlix-code/bin/controlix
During startup program terminated with signal SIGSEGV, Segmentation fault.
(gdb) bt
No stack.
(gdb)
I assume it is crashing before main() is called, but beyond that I don't have a clue. I haven't been able to find much about this type of situation on Google, so I thought I'd ask here.
One approach is to catch all exceptions before running:
catch throw
run
And if that does not help, you may have to single-step through the assembly from the very beginning. But before you do that,
break main
run
and single-step through the code using step and next should lead you to the culprit.
I know I can use core dump file to figure out where the program goes wrong. However, there are some bugs that even you debug it with core file, you still don't know why it goes wrong. So what I want to convey is that the scope of the bugs that gdb and core files can help you to debug is limited. And how limited is that?
For example, I write the following code : (libfoo.c)
#include <stdio.h>
#include <stdlib.h>
void foo(void);
int main()
{
puts("This is a mis-compiled runnable shared library");
return 0;
}
void foo()
{
puts("This is the shared function");
}
The following is the makefile : (Makefile)
.PHONY : all clean
all : libfoo.c
gcc -g -Wall -shared -fPIC -Wl,-soname,$(basename $^).so.1 -o $(basename $^).so.1.0.0 $^; \
#the correct compiling command should be :
#gcc -g -Wall -shared -fPIC -pie -Wl,--export-dynamic,-soname,$(basename $^).so.1 -o $(basename $^).so.1.0.0 $^;
sudo ldconfig $(CURDIR); #this will set up soname link \
ln -s $(basename $^).so.1.0.0 $(basename $^).so #this will set up linker name link;
clean :
-rm libfoo.s*; sudo ldconfig;#roll back
When I ran it ./libfoo.so, I got segmentation fault, and this was because I compiled the runnable shared library in a wrong way. But I wanted to know exactly what was causing the segmentation fault. So I used gdb libfoo.so.1.0.0 corefile, then bt and got the following:
[xhan#localhost Desktop]$ gdb ./libfoo.so core.8326
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-100.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/xiaohan/Desktop/libfoo.so.1.0.0...done.
warning: core file may not match specified executable file.
[New LWP 8326]
Core was generated by `./libfoo.so'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000000001 in ?? ()
(gdb) bt
#0 0x0000000000000001 in ?? ()
#1 0x00007ffd29cd13b4 in ?? ()
#2 0x0000000000000000 in ?? ()
(gdb) quit
But I still don't know what caused the segmentation fault. Debugging the core file can not give me any clue that the cause of my segmentation fault is that I used a wrong compiling command.
Can anyone help me with debugging this? Or can anyone tell me the scope of the bugs that is impossible to debug even using gdb and core file? Answers that respond to only one question will also be accepted.
Thanks!
IMPORTANT ASSUMPTIONS I AM HOLDING:
Some may ask why I want to make a shared library runnable. I do this because I want to compile a shared library what is similar to /lib64/ld-2.17.so.
Of course you can't rely on gdb telling you the cause of every bugs you have made. For example, if you simply chmod +x nonexecutable and run it, then get a bug(usually this will not dump core file), and try to debug it with gdb, that is somewhat "crazy". However, once an "executable" can be loaded and dumps a core file during runtime, you can use gdb to debug it, and furthermore, FIND CLUES about why the program goes wrong. However, in my problem ./libfoo.so, I am totally lost.
the scope of the bugs that gdb and core files can help you to debug is limited.
Correct: there are several large classes of bugs for which core dump provides little help. The most common (in my experience) are:
Issues that happen at process startup (such as the example you showed).
GDB needs cooperation with the dynamic loader to tell GDB where various ELF images are mmaped in the process space.
When the crash happens in the dynamic loader itself, or before the dynamic loader had a chance to tell GDB where things are, you end up with a very confusing picture.
Various heap corruption bugs.
Usually you can tell that it's likely that heap corruption is the problem (e.g. any crash inside malloc or free is usually a sign of one), but that tells you very little about the root cause of the problem.
Fortunately, tools like Valgrind and Address Sanitizer can often point you straight at the problem.
Various stack overflow bugs.
GDB uses contents of current stack to tell you how you got to the function you are in (backtrace).
But if you overwrite stack memory with garbage, then the record of how you got to where you are is lost. And if you corrupt stack, and then use "grbage" function pointer, then you can end up with a core dump from which you can't tell either where you are, or how you got there.
Various "logical" bugs.
For example, suppose you have a tree data structure, and a recursive procedure to visit its nodes. If your tree is not a proper tree, and has a cycle in it, your visit procedure will run out of stack and crash.
But looking at the crash tells you nothing about where the tree ceased to be a tree and turned into a graph.
Data races.
You may be iterating over elements of std::vector and crash. Examining the vector shows you that it is no longer in valid state.
That often happens when some other thread modifies the vector (or any other data structure) from under you.
Again, the crash stack trace tells you very little where the bug actually is.
I'm trying to get to the bottom of a bug in KDE 5.6. The locker screen breaks no matter how I lock it. Here's the relevant code: https://github.com/KDE/kscreenlocker/blob/master/abstractlocker.cpp#L51
When I run /usr/lib/kscreenlocker_greet --testing, I get an output of:
KCrash: Application 'kscreenlocker_greet' crashing...
Floating point exception (core dumped)
I'm trying to run it with gdb to try and pin the exact location of the bug, but I'm not sure where to set the breakpoints in order to isolate the bug. Should I be looking for calls to KCrash? Or perhaps a raise() call? Can I get gdb to print off the relevant line of code that causes SIGFPE?
Thanks for any advice you can offer.
but I'm not sure where to set the breakpoints in order to isolate the bug
You shouldn't need to set any breakpoints at all: when a process running under GDB encounters a fatal signal (such as SIGFPE), the OS notices that the process is being traced by the debugger, and notifies the debugger (instead of terminating the process). That in turn causes GDB to stop, and prompt you for additional commands. It is at that time that you can look around and understand what caused the crash.
Example:
cat -n t.c
1 #include <fenv.h>
2
3 int foo(double d) {
4 return 1/d;
5 }
6
7 int main()
8 {
9 feenableexcept(FE_DIVBYZERO);
10 return foo(0);
11 }
gcc -g t.c -lm
./a.out
Floating point exception
gdb -q ./a.out
(gdb) run
Starting program: /tmp/a.out
Program received signal SIGFPE, Arithmetic exception.
0x000000000040060e in foo (d=0) at t.c:4
4 return 1/d;
(gdb) bt
#0 0x000000000040060e in foo (d=0) at t.c:4
#1 0x0000000000400635 in main () at t.c:10
(gdb) q
Here, as you can see, GDB stops when SIGFPE is delivered, and allows you to look around and understand the crash.
In your case, you would want to first install debuginfo symbols for KDE, and then run
gdb --args /usr/lib/kscreenlocker_greet --testing
(gdb) run
I am using 3 HP-UX PA RISC machines for testing. My binary is failing on one PA RISC machine where as others it working. Note that, even though binary is executed with version check i.e. it just print version and exit and don't perform any other operation , still binary is giving segmentation fault. what could be probable reason for Segmentation fault. It is important to me to find out root cause of the failure on one box. As program is working on 2 HP-UX, it seems that it is environment issue?
I tried to copy same piece of code (i.e. declare variables, print version and exit) in test program and build with same compilation options but it is working. Here is gdb output for the program.
$ gdb prg_us
Detected 64-bit executable.
Invoking /opt/langtools/bin/gdb64.
HP gdb 5.4.0 for PA-RISC 2.0 (wide), HP-UX 11.00
and target hppa2.0w-hp-hpux11.00.
Copyright 1986 - 2001 Free Software Foundation, Inc.
Hewlett-Packard Wildebeest 5.4.0 (based on GDB) is covered by the
GNU General Public License. Type "show copying" to see the conditions to
change it and/or distribute copies. Type "show warranty" for warranty/support.
..
(gdb) b 5573
Breakpoint 1 at 0x4000000000259e04: file pmgreader.c, line 5573 from /tmp/test/prg_us.
(gdb) r -v
Starting program: /tmp/test/prg_us -v
Breakpoint 1, main (argc=2, argv=0x800003ffbfff05f8) at pmgreader.c:5573
5573 if (argc ==2 && strcmp (argv[1], "-v") == 0)
Current language: auto; currently c++
(gdb) n
5575 printf ("%s", VER);
(gdb) n
5576 exit(0);
(gdb) n
Program received signal SIGSEGV, Segmentation fault
si_code: 0 - SEGV_UNKNOWN - Unknown Error.
0x800003ffbfb9e130 in real_free+0x480 () from /lib/pa20_64/libc.2
(gdb)
What should be probable cause? why it is working on one and not on another?
Just a long shot - are you including both stdio.h and stdlib.h so the prototypes for printf() and exit() are known to the compiler?
Actually, after a bit more thought (and noticing that C++ is in the mix), you may have some static object initialization causing problems (possibly corrupting the heap?).
Unfortunately, it looks like valgrind is not supported on PA-RISC - is there some similar tool on PA-RISC you can run? If not, it might be worthwhile running valgrind on an x64 build of your program if it's not too difficult to set that up.
Michael Burr already hinted at the problem: it's a global object.
Notice that the crash is from a free function. That indicates a memory deallocation, and in turn a destructor. This makes sense given the context: global destructors run after exit(0). A stack trace will show more detail.
I have this output when trying to debug
Program received signal SIGSEGV, Segmentation fault 0x43989029 in
std::string::compare (this=0x88fd430, __str=#0xbfff9060) at
/home/devsw/tmp/objdir/i686-pc-linux-gnu/libstdc++-v3/include/bits/char_traits.h:253
253 { return memcmp(__s1, __s2, __n); }
Current language: auto; currently c++
Using valgrind I getting this output
==12485== Process terminating with default action of signal 11 (SIGSEGV)
==12485== Bad permissions for mapped region at address 0x0
==12485== at 0x1: (within path_to_my_executable_file/executable_file)
You don't need to use Valgrind, in fact you want to use the GNU DeBugger (GDB).
If you run the application via gdb (gdb path_to_my_executable_file/executable_file) and you've compiled the application with debugging enabled (-g or -ggdb for GNU C/C++ compilers), you can start the application (via run command at the gdb prompt) and once you arrive at the SegFault, do a backtrace (bt) to see what part of your program called std::string::compare which died.
Example (C):
mctaylor#mpc:~/stackoverflow$ gcc -ggdb crash.c -o crash
mctaylor#mpc:~/stackoverflow$ gdb -q ./crash
(gdb) run
Starting program: /home/mctaylor/stackoverflow/crash
Program received signal SIGSEGV, Segmentation fault.
0x00007f78521bdeb1 in memcpy () from /lib/libc.so.6
(gdb) bt
#0 0x00007f78521bdeb1 in memcpy () from /lib/libc.so.6
#1 0x00000000004004ef in main (argc=1, argv=0x7fff3ef4d848) at crash.c:5
(gdb)
So the error I'm interested in is located on crash.c line 5.
Good luck.
Just run the app in the debugger. At one point it will die and you will have a stack trace with the information you want.