I am using 3 HP-UX PA RISC machines for testing. My binary is failing on one PA RISC machine where as others it working. Note that, even though binary is executed with version check i.e. it just print version and exit and don't perform any other operation , still binary is giving segmentation fault. what could be probable reason for Segmentation fault. It is important to me to find out root cause of the failure on one box. As program is working on 2 HP-UX, it seems that it is environment issue?
I tried to copy same piece of code (i.e. declare variables, print version and exit) in test program and build with same compilation options but it is working. Here is gdb output for the program.
$ gdb prg_us
Detected 64-bit executable.
Invoking /opt/langtools/bin/gdb64.
HP gdb 5.4.0 for PA-RISC 2.0 (wide), HP-UX 11.00
and target hppa2.0w-hp-hpux11.00.
Copyright 1986 - 2001 Free Software Foundation, Inc.
Hewlett-Packard Wildebeest 5.4.0 (based on GDB) is covered by the
GNU General Public License. Type "show copying" to see the conditions to
change it and/or distribute copies. Type "show warranty" for warranty/support.
..
(gdb) b 5573
Breakpoint 1 at 0x4000000000259e04: file pmgreader.c, line 5573 from /tmp/test/prg_us.
(gdb) r -v
Starting program: /tmp/test/prg_us -v
Breakpoint 1, main (argc=2, argv=0x800003ffbfff05f8) at pmgreader.c:5573
5573 if (argc ==2 && strcmp (argv[1], "-v") == 0)
Current language: auto; currently c++
(gdb) n
5575 printf ("%s", VER);
(gdb) n
5576 exit(0);
(gdb) n
Program received signal SIGSEGV, Segmentation fault
si_code: 0 - SEGV_UNKNOWN - Unknown Error.
0x800003ffbfb9e130 in real_free+0x480 () from /lib/pa20_64/libc.2
(gdb)
What should be probable cause? why it is working on one and not on another?
Just a long shot - are you including both stdio.h and stdlib.h so the prototypes for printf() and exit() are known to the compiler?
Actually, after a bit more thought (and noticing that C++ is in the mix), you may have some static object initialization causing problems (possibly corrupting the heap?).
Unfortunately, it looks like valgrind is not supported on PA-RISC - is there some similar tool on PA-RISC you can run? If not, it might be worthwhile running valgrind on an x64 build of your program if it's not too difficult to set that up.
Michael Burr already hinted at the problem: it's a global object.
Notice that the crash is from a free function. That indicates a memory deallocation, and in turn a destructor. This makes sense given the context: global destructors run after exit(0). A stack trace will show more detail.
Related
I know I can use core dump file to figure out where the program goes wrong. However, there are some bugs that even you debug it with core file, you still don't know why it goes wrong. So what I want to convey is that the scope of the bugs that gdb and core files can help you to debug is limited. And how limited is that?
For example, I write the following code : (libfoo.c)
#include <stdio.h>
#include <stdlib.h>
void foo(void);
int main()
{
puts("This is a mis-compiled runnable shared library");
return 0;
}
void foo()
{
puts("This is the shared function");
}
The following is the makefile : (Makefile)
.PHONY : all clean
all : libfoo.c
gcc -g -Wall -shared -fPIC -Wl,-soname,$(basename $^).so.1 -o $(basename $^).so.1.0.0 $^; \
#the correct compiling command should be :
#gcc -g -Wall -shared -fPIC -pie -Wl,--export-dynamic,-soname,$(basename $^).so.1 -o $(basename $^).so.1.0.0 $^;
sudo ldconfig $(CURDIR); #this will set up soname link \
ln -s $(basename $^).so.1.0.0 $(basename $^).so #this will set up linker name link;
clean :
-rm libfoo.s*; sudo ldconfig;#roll back
When I ran it ./libfoo.so, I got segmentation fault, and this was because I compiled the runnable shared library in a wrong way. But I wanted to know exactly what was causing the segmentation fault. So I used gdb libfoo.so.1.0.0 corefile, then bt and got the following:
[xhan#localhost Desktop]$ gdb ./libfoo.so core.8326
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-100.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/xiaohan/Desktop/libfoo.so.1.0.0...done.
warning: core file may not match specified executable file.
[New LWP 8326]
Core was generated by `./libfoo.so'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000000001 in ?? ()
(gdb) bt
#0 0x0000000000000001 in ?? ()
#1 0x00007ffd29cd13b4 in ?? ()
#2 0x0000000000000000 in ?? ()
(gdb) quit
But I still don't know what caused the segmentation fault. Debugging the core file can not give me any clue that the cause of my segmentation fault is that I used a wrong compiling command.
Can anyone help me with debugging this? Or can anyone tell me the scope of the bugs that is impossible to debug even using gdb and core file? Answers that respond to only one question will also be accepted.
Thanks!
IMPORTANT ASSUMPTIONS I AM HOLDING:
Some may ask why I want to make a shared library runnable. I do this because I want to compile a shared library what is similar to /lib64/ld-2.17.so.
Of course you can't rely on gdb telling you the cause of every bugs you have made. For example, if you simply chmod +x nonexecutable and run it, then get a bug(usually this will not dump core file), and try to debug it with gdb, that is somewhat "crazy". However, once an "executable" can be loaded and dumps a core file during runtime, you can use gdb to debug it, and furthermore, FIND CLUES about why the program goes wrong. However, in my problem ./libfoo.so, I am totally lost.
the scope of the bugs that gdb and core files can help you to debug is limited.
Correct: there are several large classes of bugs for which core dump provides little help. The most common (in my experience) are:
Issues that happen at process startup (such as the example you showed).
GDB needs cooperation with the dynamic loader to tell GDB where various ELF images are mmaped in the process space.
When the crash happens in the dynamic loader itself, or before the dynamic loader had a chance to tell GDB where things are, you end up with a very confusing picture.
Various heap corruption bugs.
Usually you can tell that it's likely that heap corruption is the problem (e.g. any crash inside malloc or free is usually a sign of one), but that tells you very little about the root cause of the problem.
Fortunately, tools like Valgrind and Address Sanitizer can often point you straight at the problem.
Various stack overflow bugs.
GDB uses contents of current stack to tell you how you got to the function you are in (backtrace).
But if you overwrite stack memory with garbage, then the record of how you got to where you are is lost. And if you corrupt stack, and then use "grbage" function pointer, then you can end up with a core dump from which you can't tell either where you are, or how you got there.
Various "logical" bugs.
For example, suppose you have a tree data structure, and a recursive procedure to visit its nodes. If your tree is not a proper tree, and has a cycle in it, your visit procedure will run out of stack and crash.
But looking at the crash tells you nothing about where the tree ceased to be a tree and turned into a graph.
Data races.
You may be iterating over elements of std::vector and crash. Examining the vector shows you that it is no longer in valid state.
That often happens when some other thread modifies the vector (or any other data structure) from under you.
Again, the crash stack trace tells you very little where the bug actually is.
My question sounds specific, but I doubt it still can be of a C++ debug issue.
I am using omnet++ which is to simulate wireless network. omnet++ itself is a c++ program.
I encountered a queer phenomena when I run my program (modified inet framework with omnet++ 4.2.2 in Ubuntu 12.04): the program exit with exit code 139 (people say this means memory fragmentation) when touching a certain part of the codes, when I try to debug, gdb doesn't report anything wrong with the 'problematic' codes where the simulation exits previously, actually, the debug goes through this part of codes and output expected results.
gdb version info: GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Could anybody tell me why the run fails but debug doesn't?
Many thanks!
exit code 139 (people say this means memory fragmentation)
No, it means that your program died with signal 11 (SIGSEGV on Linux and most other UNIXes), also known as segmentation fault.
Could anybody tell me why the run fails but debug doesn't?
Your program exhibits undefined behavior, and can do anything (that includes appearing to work correctly sometimes).
Your first step should be running this program under Valgrind, and fixing all errors it reports.
If after doing the above, the program still crashes, then you should let it dump core (ulimit -c unlimited; ./a.out) and then analyze that core dump with GDB: gdb ./a.out core; then use where command.
this error is also caused by null pointer reference.
if you are using a pointer who is not initialized then it causes this error.
to check either a pointer is initialized or not you can try something like
Class *pointer = new Class();
if(pointer!=nullptr){
pointer->myFunction();
}
here is my sample program:
#include<stdio.h>
int main()
{
printf("hello good morning \n");
return 0;
}
gcc -Wall -g temp.c
/opt/langtools/bin/gdb a.out
HP gdb 3.3 for PA-RISC 1.1 or 2.0 (narrow), HP-UX 11.00.
Copyright 1986 - 2001 Free Software Foundation, Inc.
Hewlett-Packard Wildebeest 3.3 (based on GDB) is covered by the
GNU General Public License. Type "show copying" to see the conditions to
change it and/or distribute copies. Type "show warranty" for warranty/support.
..
(gdb) b 6
Breakpoint 1 at 0x2b14: file temp.c, line 6.
(gdb) run
Starting program: /oo_dgfqausr/test/dfqwrk4/temp/a.out
Breakpoint 1, main () at temp.c:6
6 printf("hello good morning \n");
(gdb) step
hello good morning
7 return 0;
(gdb)
as soon as i try to step into the printf function.its exiting and returning to main.
does this mean that the shred library in which the printf function is defined is not provided with the debug symbols?Or am i doing something wrong?
This means there's no available source/debug symbols for printf. You can use stepi to step into printf anyway, you'll only have disassembly available (use the disas command).
That's correct, you likely do not have debugging symbols available. Make sure libc-devel or similar is installed. Also, make sure to compile with -O0 to prevent optimization; optimizations make debugging more difficult to follow.
Also, -g3 is required for maximum symbols. With -g3, even symbolic constants will be available. -ggdb may be helpful too. Jan from GDB tells us there are no mainline GDB extensions, but Apple may have offered some and omitted backstrem patches.
I am writing a plugin for a application, occasionally a SIGSEGV would be throw out. However, the application catches the signal SIGSEGV. In other word, The plugin is a dynamical library. The error occurs in my plugin and dynamical library. But the applcation handle the sSIGSEGV and exit normally. So, it is quite difficult for me to debug and get the backtrace of all stack frames. Any idea?
Currently I am using gdb as debug tool.
GDB will catch SIGSEGV before the application does.
What you described in comment to Logan's answer makes no sense.
I suspect what's really happening is that the application creates a new process, and only gets SIGSEGV in that other process, not the one you attached GDB to.
The following commands may be useful if my guess is correct:
(gdb) catch fork
(gdb) catch vfork
(gdb) set follow-fork-mode child
You might also want to edit and expand your question:
how do you know there is a SIGSEGV to begin with?
Posting a log of your interaction with GDB may also prove useful.
Even if the program traps SIGSEGV, gdb should still get it first and give you an opportunity to debug the program. Have you done something like
handle SIGSEGV nostop
in GDB? If so that could be why it is not stopping.
Are you sure that a segfault is actually occurring? Can you duplicate this behavior with another program, or by intentionally causing a segmentation violation?
For example:
$ cat sig.c
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
void handle(int n)
{
puts("Bail");
exit(1);
}
int main()
{
signal(SIGSEGV, handle);
int *pi = 0;
*pi = 10;
return 0;
}
$ gcc -g sig.c
$ ./a.out
Bail
$ gdb ./a.out
GNU gdb 6.6-debian
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) run
Starting program: /home/elcapaldo/a.out
Program received signal SIGSEGV, Segmentation fault.
0x08048421 in main () at sig.c:15
15 *pi = 10;
(gdb) where
#0 0x08048421 in main () at sig.c:15
(gdb) c
Continuing.
Bail
Program exited with code 01.
(gdb) q
I won't post any code, because there is too much that could be relevant. But When I run my program it prints
Internal Bad Op Name!
: Success
Anybody even know what that means? I'm using g++ to compile my code and nowhere in my code do I cout anything even remotely close to something like that. I don't know where it's coming from. Also, any suggestions as to figure out where in the code it's coming from, maybe using gdb somehow to do that?
Thanks!
It's not a message I've seen, and Googling for it doesn't show anything obviously related.
You can identify where it comes from by stepping through the program with gdb until the message appears. Alternatively, one can sprinkle some timing delays, "I am here" statements, or input prompts to discover suspect portions of the logic.
< < < (edit) > > >
To use gdb, first be sure to compile and link with debug symbols. With either gcc or g++, just add -g to the command line. It's also often helpful to eliminate any compiler optimizations since those can sometimes make stepping through the program non-intuitive.
[wally#lf ~]$ gdb program
GNU gdb Fedora (6.8-32.fc10)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...
(gdb) break main
Breakpoint 1 at 0x8048c3c: file rtpsim.cpp, line 30.
(gdb) run
Starting program: ~/program
Breakpoint 1, main () at rtpsim.cpp:30
30 rtp_io (&obj, INIT_CYCLE);
(gdb) next
31 printf ("- - - - - init complete - - - - -\n");
(gdb) <---- pressed "enter" to repeat last command
- - - - - init complete - - - - -
33 for (int j = 0; j < 10; ++j)
(gdb)
35 sleep (1);
(gdb)
36 rtp_io (&obj, SCAN_CYCLE);
(gdb)
37 printf ("- - - - - scan %d complete - - - - -\n", j+1);
...
What libraries and what platform are you using? No C++ compiler I know of (certainly not GCC) introduces output to your program except before aborting.
Edit: maybe easier than backtracking or finding references, use grep -a to find that string in all your sources and library binaries.
To debug the program with GDB, first make sure that it is compiled with the -g flag. Then type gdb your-program-name into the command line. GDB is a command based debugger. To get started, type help. Or, there are graphical debugging tools, like xxgdb (though for this, it is a good thing to understand basic gdb commands), ddd, kdbg (KDE based), Eclipse (it is not very straightforward to configure if you want to use your own makefile) etc.