when i run the core file in gdb, gdb doesn't show where the error is coming from or what line
in the application that causes the problem.
i'm using the compiler options -g -DDEBUG -D_DEBUG, but it doesnt seem help.
Any help would be appreciated, thanks.
You could be blowing your stack. For example, after running the following program
#include <stdio.h>
#include <string.h>
int main(void)
{
int a[10];
memset(a, 0, 100 * sizeof a[0]);
return 0;
}
and then running gdb on the resulting core yields
$ gdb oflow core
[...]
Core was generated by `./oflow'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000000000 in ?? ()
The output of the where and bt commands isn't terribly useful:
(gdb) where
#0 0x0000000000000000 in ?? ()
#1 0x0000000000000000 in ?? ()
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x0000000000000000 in ?? ()
ok, problem solved. i had a recursive function that returned a string, but the problem was there was nothing being returned, but i still don't understand why debugging info wasn't generated, when i step through the code it shows the line numbers i'm stepping through, but i guess because the line it was getting an error was missing? so there was no breakpoint for where it went wrong? when it tried to concatenate itself by recursing into the function, using "+=" it would go into the second call but then crash at the end of the function because nothing was being returned. but shouldn't that have generated an error on the first function call on the line where it didn't return?
thanks.
Related
I'm learning how to debug with gdb on my mac and after finding a segmentation fault I wanted to use it to learn.
I'm using gdb 8.0.1 and gcc 7.2.0 both from homebrew, I'm compiling with -ggdb and running gdb directly from my makefile through gdb -ex run ./main.
I open the game, I open a menu inside it, and when I try close it it crashes because I do this in WindowsObject.cpp :
WindowObject_CraftingGrid::~WindowObject_CraftingGrid(){
for (unsigned i = 0; i < gridSlots_.size(); i++) {
for (unsigned j = 0; j < gridSlots_[0].size(); i++) { //i++ instead of j++, this leads to the crash
delete gridSlots_[i][j];
}
}
}
Gdb says:
(gdb) bt
#0 0x0000000100023a80 in WindowObject_Image::Draw (this=0x300000000) at src/WindowObjects.cpp:620
#1 0x0000000100023ae2 in WindowObject_Image::setImage (this=0x100a9e980, img=0x0) at src/WindowObjects.cpp:629
#2 0x000000010001d5f7 in WindowMain::AddSection (this=0x100a04ce0, n=28672) at src/Window.cpp:263
#3 0x0000000100033765 in LoadLibrary () at src/main.cpp:781
#4 0x0000000100030b25 in DrawGUI () at src/main.cpp:465
#5 0x0000000100031534 in DrawGUI () at src/main.cpp:501
#6 0x00000001006eae27 in ?? ()
#7 0x0000700001875ef0 in ?? ()
#8 0x00007fff40b796d8 in ?? ()
#9 0x0000000000000000 in ?? ()
And this is totally wrong because it leads to nothing useful to solve the bug because it does not point to the right objects and lines.
I discovered this bug from visual studio on my windows machine because the call stack there was quite clear:
project.exe!std::vector<std::vector>WindowObjects_Slot * //Other stuff
project.exe!WindowObject_CraftingGrid::~WindowObject_CraftingGrid() Line 348
project.exe!WindowMain::~WindowMain() Line 234
project.exe!KeyPressed(int KeyCode) Line 566
project.exe!gameloop() Line 181
project.exe!main(int argc, char ** argv) Line 321)
And this is totally wrong
No, it's not: it's where your application actually crashes on this platform.
because it leads to nothing useful to solve the bug
You have a heap corruption bug. Heap corruption bugs are like that: your application may crash some time after heap corruption, in an arbitrary place.
In addition, the stack trace is not useless: it tells you that this == 0x300000000, which is not a reasonable value for this, and therefore you are looking at some kind of heap corruption.
There are many ways to debug similar problems: debug malloc, Address Sanitizer and Valgrind among them.
Building with -D_GLIBCXX_DEBUG enables debugging mode in GCC STL, and would likely also point you straight at the bug.
I wish to test in my program below: when s="abc", break inside "f()" and see the value if "i".
#include<string>
using namespace std;
int i=0;
void f(const string& s1)
{
++i; // line 6
}
int main()
{
string s="a";
s+="b";
s+="c";
s+="d";
s+="e";
s+="f";
return 0;
}
Compile and run a.out, no problem. I then debug it
g++ 1.cpp -g
gdb a.out
...
(gdb) b main if strcmp(s.c_str(),"abc")==0
Breakpoint 1 at 0x400979: file 1.cpp, line 9.
(gdb) r
Starting program: /home/dev/a.out
Program received signal SIGSEGV, Segmentation fault.
__strcmp_sse2_unaligned () at ../sysdeps/x86_64/multiarch/strcmp-sse2-unaligned.S:31
31 ../sysdeps/x86_64/multiarch/strcmp-sse2-unaligned.S: No such file or directory.
Error in testing breakpoint condition:
The program being debugged was signaled while in a function called from GDB.
GDB remains in the frame where the signal was received.
To change this behavior use "set unwindonsignal on".
Evaluation of the expression containing the function
(__strcmp_sse2_unaligned) will be abandoned.
When the function is done executing, GDB will silently stop.
Program received signal SIGSEGV, Segmentation fault.
Breakpoint 1, __strcmp_sse2_unaligned () at ../sysdeps/x86_64/multiarch/strcmp-sse2-unaligned.S:31
31 in ../sysdeps/x86_64/multiarch/strcmp-sse2-unaligned.S
If I change the break point declaration into:
(gdb) b main:6 if s.compare("abc")==0
Breakpoint 1 at 0x400979: file 1.cpp, line 9.
Then I get another kind of crash, seems:
(gdb) r
Starting program: /home/dev/a.out
Program received signal SIGSEGV, Segmentation fault.
__memcmp_sse4_1 () at ../sysdeps/x86_64/multiarch/memcmp-sse4.S:1024
1024 ../sysdeps/x86_64/multiarch/memcmp-sse4.S: No such file or directory.
Error in testing breakpoint condition:
The program being debugged was signaled while in a function called from GDB.
GDB remains in the frame where the signal was received.
To change this behavior use "set unwindonsignal on".
Evaluation of the expression containing the function
(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::compare(char const*) const) will be abandoned.
When the function is done executing, GDB will silently stop.
Program received signal SIGSEGV, Segmentation fault.
Breakpoint 1, __memcmp_sse4_1 () at ../sysdeps/x86_64/multiarch/memcmp-sse4.S:1024
1024 in ../sysdeps/x86_64/multiarch/memcmp-sse4.S
Is this crash caused by gdb, or my command? If my command has runtime problem, why gdb doesn't simply report an error, by rather crash the program?
Hope to get some explanations, as I didn't get this error cause.
What is going on here is that your command:
(gdb) break main:6
... is interpreted by gdb as the same as break main. You can see this by typing the latter as well:
(gdb) b main:6
Breakpoint 1 at 0x400919: file q.cc, line 10.
(gdb) b main
Note: breakpoint 1 also set at pc 0x400919.
Breakpoint 2 at 0x400919: file q.cc, line 10.
Now, this is peculiar because gdb presumably ought to warn you that the trailing :6 is ignored. (I'd recommend filing a bug asking that this be made a syntax error.)
If you want to break at a certain line in a file you must use the source file name. Presumably you meant to type:
(gdb) break main.cc:6
Program terminated with signal 11, Segmentation fault.
#0 0x00007f0412571733 in boost::detail::interruption_checker::~interruption_checker() ()
from /opt/HYDRAstor/objectStorage/lib/release_prod_64/libXyzService.so
Missing separate debuginfos, use: debuginfo-install python-2.6.6-52.x86_64
(gdb) where
#0 0x00007f0412571733 in boost::detail::interruption_checker::~interruption_checker() ()
from /opt/HYDRAstor/objectStorage/lib/release_prod_64/libXyzService.so
#1 0x00007f041181547a in boost::this_thread::sleep(boost::posix_time::ptime const&) () from /usr/lib64/libboost_thread-mt.so.1.41.0
#2 0x00007f040c5ea36c in void boost::this_thread::sleep<boost::posix_time::seconds>(boost::posix_time::seconds const&) ()
from /opt/HYDRAstor/objectStorage/lib/release_prod_64/libAbcLib.so
#3 0x00007f040c5daf63 in healthMonitoring::healthMonitoringController::print(bool) ()
from /opt/HYDRAstor/objectStorage/lib/release_prod_64/libAbcLib.so
#4 0x00007f0411813d10 in thread_proxy () from /usr/lib64/libboost_thread-mt.so.1.41.0
#5 0x000000365d6079d1 in start_thread () from /lib64/libpthread.so.0
#6 0x000000365cee88fd in clone () from /lib64/libc.so.6
(gdb)
As you can see from this backtrace, seg fault is raised by loaded shared library libXyz.so. So how can I know from what point in code of this shared library, this seg fault was raised?
What is the use of addresses mentioned in start of each frame.
Please let me know if any more detail is needed.
So how can I know from what point in code of this shared library, this
seg fault was raised?
Try to rebuild everything from scratch with optimizations disabled (with -O0 or -Og) and with debug info enabled (-g). And make sure that you are not stripping resulting binaries (not running strip on them).
This should give you more meaningful stack traces with line numbers and file names.
I use gdb to debug my program, when I unpack a message and wanna print it, I got a problem. It seems that i can print from command line in the terminal, but when program goes to the printf("%d has received msg: ", msg->connid);, I got the problem,
Program received signal SIGSEGV, Segmentation fault.
0xb7ff6301 in ?? () from /lib/ld-linux.so.2
(gdb)n
154 LSPMessage* msg = lspmessage__unpack(NULL, msg_len, buf);
(gdb) n
156 memcpy(pld, msg->payload.data, msg->payload.len);
(gdb) p msg->payload.data
$1 = (uint8_t *) 0x804c038 "Connectedrt,\031"
(gdb) p msg->connid
$2 = 1
(gdb) p msg->payload.len
$3 = 9
174 printf("%d has received msg: ", msg->connid); // required field
(gdb) n
Program received signal SIGSEGV, Segmentation fault.
0xb7ff6301 in ?? () from /lib/ld-linux.so.2
You did a remarkably poor job of explaining what it is you are actually asking.
I think your question is: "how come my printf call crashes?".
There could be several reasons
You have corrupted stdout earlier, or
msg->connid is not int (and so it is wrong to printf it with "%d", or
you have corrupted something else in the runtime loader state, and it now crashes while doing lazy PLT symbol resolution
Since you crash is inside the runtime loader, that last cause appears to be most likely. You can confirm this hypothesis by forcing the loader to perform symol resolution non-lazily:
(gdb) set env LD_BIND_NOW 1
(gdb) run
Did it still crash? If not, run your program under Valgrind, and be sure to fix all the problems it reports.
I'm trying to find the reason for a segfault which is occurring on the level of system libraries.
I would like get some hints on how to use gdb to examine args of the getenv() call seen in the following stack trace.
As the trace shows - getenv() is not called directly by my code - call is nested in the chain of system calls initiated in my code. Call is starting with my routine a_logmsg() trying to get thread-safe localtime - localtime_r(), and getenv() is called later somewhere within the code of libc. OS is Solaris 8/SPARC.
Program terminated with signal 11, Segmentation fault.
#0 0xfed3c9a0 in getenv () from /usr/lib/libc.so.1
(gdb) where
#0 0xfed3c9a0 in getenv () from /usr/lib/libc.so.1
#1 0xfed46ab0 in getsystemTZ () from /usr/lib/libc.so.1
#2 0xfed44918 in ltzset_u () from /usr/lib/libc.so.1
#3 0xfed44140 in localtime_r () from /usr/lib/libc.so.1
#4 0x00029c28 in a_logmsg (fmt=0xfd5d0 "%s: no changes to config.") at misc.c:155
#5 0x000273b8 in a_sync_device (device_group=0x11e3ed0 "none", hostname=0xfbbffe8d "router",
config_by=0xfbbffc8f "scheduled_archiving", platform=0x11e3ee0 "cisco", authset=0x11e3ef0 "set01",
arch_method=0xffffcfc8 <Address 0xffffcfc8 out of bounds>) at arch.c:256
#6 0x00027ce8 in a_archive_single (arg=0x1606f50) at arch.c:498
#7 0xfe775378 in _lwp_start () from /usr/lib/libthread.so.1
#8 0xfe775378 in _lwp_start () from /usr/lib/libthread.so.1
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
I would like get some hints on how to use gdb to examine args of the getenv() call seen in the following stack trace.
The source for Solaris libc is available here.
You can examine argument to getenv by setting the breakpoint on it, and looking at the registers. You'll need to know the ABI that is used, but it's quite simple -- the argument to getenv is in register i0, and print (char*)$i0 at the (gdb) prompt should print "TZ".
Finally, the most likely reason for a crash in getenv is that you've corrupted the environment earlier. In particular, note that this code is bad:
void buggy()
{
char buf[80];
strcpy(buf, "FOO=BAR");
putenv(buf); // <-- BUG!
}
You could usually examine the environment via one of these commands:
(gdb) x/100s __environ
(gdb) x/100s environ
Chances are, you'll see strings there which do not contain the = sign.