I'm learning how to debug with gdb on my mac and after finding a segmentation fault I wanted to use it to learn.
I'm using gdb 8.0.1 and gcc 7.2.0 both from homebrew, I'm compiling with -ggdb and running gdb directly from my makefile through gdb -ex run ./main.
I open the game, I open a menu inside it, and when I try close it it crashes because I do this in WindowsObject.cpp :
WindowObject_CraftingGrid::~WindowObject_CraftingGrid(){
for (unsigned i = 0; i < gridSlots_.size(); i++) {
for (unsigned j = 0; j < gridSlots_[0].size(); i++) { //i++ instead of j++, this leads to the crash
delete gridSlots_[i][j];
}
}
}
Gdb says:
(gdb) bt
#0 0x0000000100023a80 in WindowObject_Image::Draw (this=0x300000000) at src/WindowObjects.cpp:620
#1 0x0000000100023ae2 in WindowObject_Image::setImage (this=0x100a9e980, img=0x0) at src/WindowObjects.cpp:629
#2 0x000000010001d5f7 in WindowMain::AddSection (this=0x100a04ce0, n=28672) at src/Window.cpp:263
#3 0x0000000100033765 in LoadLibrary () at src/main.cpp:781
#4 0x0000000100030b25 in DrawGUI () at src/main.cpp:465
#5 0x0000000100031534 in DrawGUI () at src/main.cpp:501
#6 0x00000001006eae27 in ?? ()
#7 0x0000700001875ef0 in ?? ()
#8 0x00007fff40b796d8 in ?? ()
#9 0x0000000000000000 in ?? ()
And this is totally wrong because it leads to nothing useful to solve the bug because it does not point to the right objects and lines.
I discovered this bug from visual studio on my windows machine because the call stack there was quite clear:
project.exe!std::vector<std::vector>WindowObjects_Slot * //Other stuff
project.exe!WindowObject_CraftingGrid::~WindowObject_CraftingGrid() Line 348
project.exe!WindowMain::~WindowMain() Line 234
project.exe!KeyPressed(int KeyCode) Line 566
project.exe!gameloop() Line 181
project.exe!main(int argc, char ** argv) Line 321)
And this is totally wrong
No, it's not: it's where your application actually crashes on this platform.
because it leads to nothing useful to solve the bug
You have a heap corruption bug. Heap corruption bugs are like that: your application may crash some time after heap corruption, in an arbitrary place.
In addition, the stack trace is not useless: it tells you that this == 0x300000000, which is not a reasonable value for this, and therefore you are looking at some kind of heap corruption.
There are many ways to debug similar problems: debug malloc, Address Sanitizer and Valgrind among them.
Building with -D_GLIBCXX_DEBUG enables debugging mode in GCC STL, and would likely also point you straight at the bug.
Related
I'm running into a weird bus error when trying to create an object in C++. This is my gdb backtrace when the program crashes:
#0 0xff146ff4 in _malloc_unlocked () from /usr/lib/libc.so.1
#1 0xff146e40 in malloc () from /usr/lib/libc.so.1
#2 0x24430 in __builtin_new (sz=128) at /usr/local/src/gcc-2.95.1/gcc/cp/new1.cc:84
#3 0x1e71c in FileHeader::Allocate (this=0x3f5d8, freeMap=0x3eea0, fileSize=5719)
at ../filesys/filehdr.cc:63
#4 0x1f61c in FileSystem::Create (this=0x3d8b8, name=0xffbff8f3 "test", initialSize=5719)
at ../filesys/filesys.cc:200
#5 0x1ffac in Copy (from=0xffbff8e4 "assignment 2.c", to=0xffbff8f3 "test")
at ../filesys/fstest.cc:52
#6 0x15150 in main (argc=3, argv=0xffbff768) at ../threads/main.cc:116
The relevant line of code from filehdr.cc is:
IndirectHeader * s;
s = new IndirectHeader;
It crashes on the second line. I thought it might be that I wasn't explicitly using my own constructor, but adding one didn't seem to help. It seems to me like there's some other simple problem i'm not noticing but i haven't been able to find it.. Any advice would be appreciated.
What you're seeing in the backtrace is a crash allocating the memory to back your IndirectHeader. It hasn't even started constructing the object yet because it's still trying to allocate memory for it. Most likely there is a bug earlier in your program, that has corrupted the heap.
I'm trying to find the reason for a segfault which is occurring on the level of system libraries.
I would like get some hints on how to use gdb to examine args of the getenv() call seen in the following stack trace.
As the trace shows - getenv() is not called directly by my code - call is nested in the chain of system calls initiated in my code. Call is starting with my routine a_logmsg() trying to get thread-safe localtime - localtime_r(), and getenv() is called later somewhere within the code of libc. OS is Solaris 8/SPARC.
Program terminated with signal 11, Segmentation fault.
#0 0xfed3c9a0 in getenv () from /usr/lib/libc.so.1
(gdb) where
#0 0xfed3c9a0 in getenv () from /usr/lib/libc.so.1
#1 0xfed46ab0 in getsystemTZ () from /usr/lib/libc.so.1
#2 0xfed44918 in ltzset_u () from /usr/lib/libc.so.1
#3 0xfed44140 in localtime_r () from /usr/lib/libc.so.1
#4 0x00029c28 in a_logmsg (fmt=0xfd5d0 "%s: no changes to config.") at misc.c:155
#5 0x000273b8 in a_sync_device (device_group=0x11e3ed0 "none", hostname=0xfbbffe8d "router",
config_by=0xfbbffc8f "scheduled_archiving", platform=0x11e3ee0 "cisco", authset=0x11e3ef0 "set01",
arch_method=0xffffcfc8 <Address 0xffffcfc8 out of bounds>) at arch.c:256
#6 0x00027ce8 in a_archive_single (arg=0x1606f50) at arch.c:498
#7 0xfe775378 in _lwp_start () from /usr/lib/libthread.so.1
#8 0xfe775378 in _lwp_start () from /usr/lib/libthread.so.1
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
I would like get some hints on how to use gdb to examine args of the getenv() call seen in the following stack trace.
The source for Solaris libc is available here.
You can examine argument to getenv by setting the breakpoint on it, and looking at the registers. You'll need to know the ABI that is used, but it's quite simple -- the argument to getenv is in register i0, and print (char*)$i0 at the (gdb) prompt should print "TZ".
Finally, the most likely reason for a crash in getenv is that you've corrupted the environment earlier. In particular, note that this code is bad:
void buggy()
{
char buf[80];
strcpy(buf, "FOO=BAR");
putenv(buf); // <-- BUG!
}
You could usually examine the environment via one of these commands:
(gdb) x/100s __environ
(gdb) x/100s environ
Chances are, you'll see strings there which do not contain the = sign.
I'm working on an embedded platform (architecture is SH4), and my program crashed a few minutes ago with a SIGABRT.
Luckily, I was running under gdbserver, and the thread that was interrupted by this signal has this stack dump:
#0 0x2a7f1678 in raise () from /home/[user]/target/lib/libc.so.6
#1 0x2a7f2a4c in abort () from /home/[user]/target/lib/libc.so.6
#2 0x2a81ade0 in __libc_message () from /home/[user]/target/lib/libc.so.6
#3 0x2a81f3a8 in malloc_printerr () from /home/[user]/target/lib/libc.so.6
#4 0x2a8c3700 in _IO_wide_data_2 () from /home/[user]/target/lib/libc.so.6
Do you know what happened here? A bad free()? bad delete ? bad malloc?
What's "_IO_wide_data_2" supposed to do?
I see the malloc_printerr() call that I don't understand either.
Google gives me 234 results on this, but all of them are simply because the guys have that "function" in their backtrace.
It is a stream to stderr for wide character support.
You can break it down into various parts:
_IO : Input/Output.
wide_data : Wide data
2 : stderr
You also have;
_IO_wide_data_0 : stdin
_IO_wide_data_1 : stdout
They are chained as 2->1->0.
malloc_printerr() is used to print various error messages when there is something bad happening/caught in dynamic memory management. But your trace looks capped (have you removed anything?).
It could be a write to stderr where you try to write something not in memory, in corrupted memory, in …
Or it could be lower stack point causing write to stderr.
Or …
A bad free()? bad delete ? bad malloc?
Yes I think it's one of these.
If the bug is easy reproducible, put a breakpoint in malloc.c, malloc_printerr. When debugger stops there, You'll probably get full call stack and find the buggy place in Your code. I still don't know why it happens, that after entering __libc_message, the call stack gets broken.
There is how I found this strange behaviour.
Simple app that deletes the same buffer twice:
void main()
{
char * buf = new char[4*1024];
delete[] buf;
delete[] buf;
}
Inside malloc_printerr the call stack looks like this:
#0 malloc_printerr (action=3, str=0x297d0b5c "double free or corruption (top)", ptr=<value optimized out>) at malloc.c:5887
#1 0x29750be8 in __libc_free (mem=0x411008) at malloc.c:3622
#2 0x29612c70 in operator delete (ptr=<value optimized out>) at ../../../../libstdc++-v3/libsupc++/del_op.cc:49
#3 0x29612cc2 in operator delete[] (ptr=<value optimized out>) at ../../../../libstdc++-v3/libsupc++/del_opv.cc:37
#4 0x0040068a in main (argc=1, argv=0x7bb26814) at double_free.cpp:47
After entering __libc_message:
#0 __libc_message (do_abort=2, fmt=0x297d09c8 "*** glibc detected *** %s: %s: 0x%s *** ") at ../sysdeps/unix/sysv/linux/libc_fatal.c:50
#1 0x2974f3a8 in malloc_printerr (action=3, str=0x297d0b5c "double free or corruption (top)", ptr=<value optimized out>) at malloc.c:5887
#2 0x297f3700 in _IO_wide_data_2 () from /cygdrive/c/STM/SH4-Linux-gcc/opt/STM/STLinux2.3/devkit/sh4/target/lib/libc.so.6
Backtrace stopped: frame did not save the PC
Maybe it has something to do with attribute((noreturn)) and compiler optimization?
Can you reproduce this error while running under GDB? You might get more stack trace information using the various "Stack" commands found here:
GDB Cheat Sheet
You might need to move up or down a few stack frames to determine what happened.
We have a binary that generates coredump. So I ran the gdb command to analyze the issue. Please note the binary and code are in two different locations and we cannot build the whole binary using debugging symbols. Hence how and what details can I find from below backtarce:
gdb binary corefile
(gdb) where
#0 0x101fa37a in f1()
#1 0x10203812 in operator f2< ()
#2 0x085f6244 in f3 ()
#3 0x085f1574 in f4()
#4 0x0805b27b in sigsegv_handler ()
#5 <signal handler called>
#6 0x1018d945 in f5()
#7 0x1018e021 in f6()
..................................
#29 0x08055c5c in main ()
(gdb)
Please provide me gdb commands that I can issue to find what’s data inside each stack frame, what’s the issue probably is, where it is failing, other debugging methods if any?
You can use help in gdb. To navigate the stack : help stack
The main useful commands to navigate the stack are up and down. If you have debugging symbols at hand, you can use list to see where you are. Then to get information, you need print (abbreviated as 'p'). For example, if you have an int called myInt then you just type p myInt. With no debug info it will be harder. From your stack frame it seems that the problem is in f5(). One thing you can do is start your program inside gdb. it will stop right where the segfault happens. When you have hints about the part of your code that segfaults, you can compile this code unit with debugging options.
That the basics. Tell us more if you want more help.
my2c
when i run the core file in gdb, gdb doesn't show where the error is coming from or what line
in the application that causes the problem.
i'm using the compiler options -g -DDEBUG -D_DEBUG, but it doesnt seem help.
Any help would be appreciated, thanks.
You could be blowing your stack. For example, after running the following program
#include <stdio.h>
#include <string.h>
int main(void)
{
int a[10];
memset(a, 0, 100 * sizeof a[0]);
return 0;
}
and then running gdb on the resulting core yields
$ gdb oflow core
[...]
Core was generated by `./oflow'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000000000 in ?? ()
The output of the where and bt commands isn't terribly useful:
(gdb) where
#0 0x0000000000000000 in ?? ()
#1 0x0000000000000000 in ?? ()
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x0000000000000000 in ?? ()
ok, problem solved. i had a recursive function that returned a string, but the problem was there was nothing being returned, but i still don't understand why debugging info wasn't generated, when i step through the code it shows the line numbers i'm stepping through, but i guess because the line it was getting an error was missing? so there was no breakpoint for where it went wrong? when it tried to concatenate itself by recursing into the function, using "+=" it would go into the second call but then crash at the end of the function because nothing was being returned. but shouldn't that have generated an error on the first function call on the line where it didn't return?
thanks.