I'm working on an embedded platform (architecture is SH4), and my program crashed a few minutes ago with a SIGABRT.
Luckily, I was running under gdbserver, and the thread that was interrupted by this signal has this stack dump:
#0 0x2a7f1678 in raise () from /home/[user]/target/lib/libc.so.6
#1 0x2a7f2a4c in abort () from /home/[user]/target/lib/libc.so.6
#2 0x2a81ade0 in __libc_message () from /home/[user]/target/lib/libc.so.6
#3 0x2a81f3a8 in malloc_printerr () from /home/[user]/target/lib/libc.so.6
#4 0x2a8c3700 in _IO_wide_data_2 () from /home/[user]/target/lib/libc.so.6
Do you know what happened here? A bad free()? bad delete ? bad malloc?
What's "_IO_wide_data_2" supposed to do?
I see the malloc_printerr() call that I don't understand either.
Google gives me 234 results on this, but all of them are simply because the guys have that "function" in their backtrace.
It is a stream to stderr for wide character support.
You can break it down into various parts:
_IO : Input/Output.
wide_data : Wide data
2 : stderr
You also have;
_IO_wide_data_0 : stdin
_IO_wide_data_1 : stdout
They are chained as 2->1->0.
malloc_printerr() is used to print various error messages when there is something bad happening/caught in dynamic memory management. But your trace looks capped (have you removed anything?).
It could be a write to stderr where you try to write something not in memory, in corrupted memory, in …
Or it could be lower stack point causing write to stderr.
Or …
A bad free()? bad delete ? bad malloc?
Yes I think it's one of these.
If the bug is easy reproducible, put a breakpoint in malloc.c, malloc_printerr. When debugger stops there, You'll probably get full call stack and find the buggy place in Your code. I still don't know why it happens, that after entering __libc_message, the call stack gets broken.
There is how I found this strange behaviour.
Simple app that deletes the same buffer twice:
void main()
{
char * buf = new char[4*1024];
delete[] buf;
delete[] buf;
}
Inside malloc_printerr the call stack looks like this:
#0 malloc_printerr (action=3, str=0x297d0b5c "double free or corruption (top)", ptr=<value optimized out>) at malloc.c:5887
#1 0x29750be8 in __libc_free (mem=0x411008) at malloc.c:3622
#2 0x29612c70 in operator delete (ptr=<value optimized out>) at ../../../../libstdc++-v3/libsupc++/del_op.cc:49
#3 0x29612cc2 in operator delete[] (ptr=<value optimized out>) at ../../../../libstdc++-v3/libsupc++/del_opv.cc:37
#4 0x0040068a in main (argc=1, argv=0x7bb26814) at double_free.cpp:47
After entering __libc_message:
#0 __libc_message (do_abort=2, fmt=0x297d09c8 "*** glibc detected *** %s: %s: 0x%s *** ") at ../sysdeps/unix/sysv/linux/libc_fatal.c:50
#1 0x2974f3a8 in malloc_printerr (action=3, str=0x297d0b5c "double free or corruption (top)", ptr=<value optimized out>) at malloc.c:5887
#2 0x297f3700 in _IO_wide_data_2 () from /cygdrive/c/STM/SH4-Linux-gcc/opt/STM/STLinux2.3/devkit/sh4/target/lib/libc.so.6
Backtrace stopped: frame did not save the PC
Maybe it has something to do with attribute((noreturn)) and compiler optimization?
Can you reproduce this error while running under GDB? You might get more stack trace information using the various "Stack" commands found here:
GDB Cheat Sheet
You might need to move up or down a few stack frames to determine what happened.
Related
Program terminated with signal 11, Segmentation fault.
#0 0x00007f0412571733 in boost::detail::interruption_checker::~interruption_checker() ()
from /opt/HYDRAstor/objectStorage/lib/release_prod_64/libXyzService.so
Missing separate debuginfos, use: debuginfo-install python-2.6.6-52.x86_64
(gdb) where
#0 0x00007f0412571733 in boost::detail::interruption_checker::~interruption_checker() ()
from /opt/HYDRAstor/objectStorage/lib/release_prod_64/libXyzService.so
#1 0x00007f041181547a in boost::this_thread::sleep(boost::posix_time::ptime const&) () from /usr/lib64/libboost_thread-mt.so.1.41.0
#2 0x00007f040c5ea36c in void boost::this_thread::sleep<boost::posix_time::seconds>(boost::posix_time::seconds const&) ()
from /opt/HYDRAstor/objectStorage/lib/release_prod_64/libAbcLib.so
#3 0x00007f040c5daf63 in healthMonitoring::healthMonitoringController::print(bool) ()
from /opt/HYDRAstor/objectStorage/lib/release_prod_64/libAbcLib.so
#4 0x00007f0411813d10 in thread_proxy () from /usr/lib64/libboost_thread-mt.so.1.41.0
#5 0x000000365d6079d1 in start_thread () from /lib64/libpthread.so.0
#6 0x000000365cee88fd in clone () from /lib64/libc.so.6
(gdb)
As you can see from this backtrace, seg fault is raised by loaded shared library libXyz.so. So how can I know from what point in code of this shared library, this seg fault was raised?
What is the use of addresses mentioned in start of each frame.
Please let me know if any more detail is needed.
So how can I know from what point in code of this shared library, this
seg fault was raised?
Try to rebuild everything from scratch with optimizations disabled (with -O0 or -Og) and with debug info enabled (-g). And make sure that you are not stripping resulting binaries (not running strip on them).
This should give you more meaningful stack traces with line numbers and file names.
I have the following code:
std::ofstream stat("/opt/lic_status");
if ( stat.is_open() )
{
stat << ver;
stat.close();
}
My problem is that on the first line the execution is blocked. A coredump was generated by a watchdog during this block and it looks like this:
(gdb) bt
#0 0x00cb5430 in __kernel_vsyscall ()
#1 0x00b2833b in open () from /lib/libc.so.6
#2 0x00ac37c8 in _IO_new_file_fopen () from /lib/libc.so.6
#3 0x00ab73dd in __fopen_internal () from /lib/libc.so.6
#4 0x00ab9c4c in fopen64 () from /lib/libc.so.6
#5 0x00d6e877 in std::__basic_file<char>::open(char const*, std::_Ios_Openmode, int) () from /usr/lib/libstdc++.so.6
#6 0x00d1d75e in std::basic_filebuf<char, std::char_traits<char> >::open(char const*, std::_Ios_Openmode) () from /usr/lib/libstdc++.so.6
#7 0x08b625b8 in open () at /usr/lib/gcc/i686-redhat-linux/4.4.4/../../../../include /c++/4.4.4/fstream:699
#8 basic_ofstream () at /usr/lib/gcc/i686-redhat-linux/4.4.4/../../../../include/c++/4.4.4/fstream:628
I need to mention that I don't know what was the state of the /opt/lic_status file when the problem occurred. I don't know if it was opened by other process or even if it existed at all.
Does anoyone have any suggestion on what could have caused this?
I only have the coredump, can I get any info out of it?
"I need to mention that I don't know what was the state of the
/opt/lic_status file when the problem occurred. I don't know if it was opened by other process or even if it existed at all."
Based on my understanding none of the above attribute/state of the file can lead the program to block on that particular line(.i.e. where user mode program is calling open() inside the std::ofstream constructor). Whenever user mode program calls open() system call to open the files, system would complete the call with appropriate error code. It will not be the case that system(kernel mode) would not return back to user mode.
Does anyone have any suggestion on what could have caused this? I
only have the coredump, can I get any info out of it?
Entire system(kernel) is not in good state(due to some unknown reason).
The program is multi threaded and some other threads has been stuck somewhere. By looking the call stack of this thread it looks OK as it is executing in the kernel mode and calling open() system call.
If we are experiencing the first case, then I believe we can not do much and core-dump file of the program would not give any extra information to identify/confirm this. Core-dump file just contains the snapshot of that particular process.
However, if we are in second case, then we should try to analyze core-dump file further. We can fire following commands in GDB command prompt once core-dump file is loaded.
$info threads
$thread apply all backtrace
The above command would give the information (if your program is multi-threaded) as well call stack of all threads. This might be helpful to understand your problem. You can ignore the above information if you have already done it.
I'm running into a weird bus error when trying to create an object in C++. This is my gdb backtrace when the program crashes:
#0 0xff146ff4 in _malloc_unlocked () from /usr/lib/libc.so.1
#1 0xff146e40 in malloc () from /usr/lib/libc.so.1
#2 0x24430 in __builtin_new (sz=128) at /usr/local/src/gcc-2.95.1/gcc/cp/new1.cc:84
#3 0x1e71c in FileHeader::Allocate (this=0x3f5d8, freeMap=0x3eea0, fileSize=5719)
at ../filesys/filehdr.cc:63
#4 0x1f61c in FileSystem::Create (this=0x3d8b8, name=0xffbff8f3 "test", initialSize=5719)
at ../filesys/filesys.cc:200
#5 0x1ffac in Copy (from=0xffbff8e4 "assignment 2.c", to=0xffbff8f3 "test")
at ../filesys/fstest.cc:52
#6 0x15150 in main (argc=3, argv=0xffbff768) at ../threads/main.cc:116
The relevant line of code from filehdr.cc is:
IndirectHeader * s;
s = new IndirectHeader;
It crashes on the second line. I thought it might be that I wasn't explicitly using my own constructor, but adding one didn't seem to help. It seems to me like there's some other simple problem i'm not noticing but i haven't been able to find it.. Any advice would be appreciated.
What you're seeing in the backtrace is a crash allocating the memory to back your IndirectHeader. It hasn't even started constructing the object yet because it's still trying to allocate memory for it. Most likely there is a bug earlier in your program, that has corrupted the heap.
When analyzing a core dumped after a SIGABRT, gdb says that my last line of code executed (before entering library code) is a NULL assignment to a char pointer, as shown below:
gdb:
(gdb) bt full
#0 0x006337a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
No symbol table info available.
#1 0x00674815 in raise () from /lib/tls/libc.so.6
No symbol table info available.
#2 0x00676279 in abort () from /lib/tls/libc.so.6
No symbol table info available.
#3 0x006a8cca in __libc_message () from /lib/tls/libc.so.6
No symbol table info available.
#4 0x006af55f in _int_free () from /lib/tls/libc.so.6
No symbol table info available.
#5 0x006af93a in free () from /lib/tls/libc.so.6
No symbol table info available.
#6 0x00d0b14e in __builtin_delete () from /usr/lib/libstdc++-libc6.1-1.so.2
No symbol table info available.
#7 0x0808181c in MyObject::~MyObject (this=0x84f4db0, __in_chrg=3) at ./MyObject.cpp:16
this = (MyObject *) 0x84f4db0
MyObject.cpp:16 listing:
12: ...
13: MyObject::~MyObject() {
14: if (this->string != NULL) {
15: delete this->string;
16: this->string = NULL;
17: }
18: }
19: ...
First of all, I do not understand why the line 16 would result in that call stack. It would make more sense if it was a result of the execution of line 15, the one with the delete operator (unless "line 16" represents code executed after the destructor's code to free the memory allocated for that object; just guessing here).
Other than that, can anyone point the way to correctly debug that core?
What type does this->string have? Is it a char array? Then you should use delete [] this->string. Is it a pointer to an object? Then that object is either already deleted and the pointer was not nulled, or the object has never been created and the pointer was left unitialized.
The actual crash happened on this line:
15: delete this->string;
The crash happened due to to call to abort inside __libc_message. That last routine printed a message to your standard error, and the message looked something like
*** glbc detected: double free or heap corruption at ... ***
Use Valgrind or AddressSanitizer: they'll point you straight at the problem.
I do not understand why the line 16 would result in that call stack.
When you are looking at call stack that led to the raise system call, you need to understand that the CALL instruction puts the address of the next instruction to be executed on the stack, before transferring control to the called procedure, and it is that next instruction that GDB shows you in the backtrace (all debuggers do that). That next instruction may be on the current line, the next line, or 20 lines down.
It points to the next line that is about to be executed, which is line 16 in your case, the last executed statement/expression was line 15 and it crashed on that line.
Hard to tell from your posting what is wrong here though.
I'm trying to find the reason for a segfault which is occurring on the level of system libraries.
I would like get some hints on how to use gdb to examine args of the getenv() call seen in the following stack trace.
As the trace shows - getenv() is not called directly by my code - call is nested in the chain of system calls initiated in my code. Call is starting with my routine a_logmsg() trying to get thread-safe localtime - localtime_r(), and getenv() is called later somewhere within the code of libc. OS is Solaris 8/SPARC.
Program terminated with signal 11, Segmentation fault.
#0 0xfed3c9a0 in getenv () from /usr/lib/libc.so.1
(gdb) where
#0 0xfed3c9a0 in getenv () from /usr/lib/libc.so.1
#1 0xfed46ab0 in getsystemTZ () from /usr/lib/libc.so.1
#2 0xfed44918 in ltzset_u () from /usr/lib/libc.so.1
#3 0xfed44140 in localtime_r () from /usr/lib/libc.so.1
#4 0x00029c28 in a_logmsg (fmt=0xfd5d0 "%s: no changes to config.") at misc.c:155
#5 0x000273b8 in a_sync_device (device_group=0x11e3ed0 "none", hostname=0xfbbffe8d "router",
config_by=0xfbbffc8f "scheduled_archiving", platform=0x11e3ee0 "cisco", authset=0x11e3ef0 "set01",
arch_method=0xffffcfc8 <Address 0xffffcfc8 out of bounds>) at arch.c:256
#6 0x00027ce8 in a_archive_single (arg=0x1606f50) at arch.c:498
#7 0xfe775378 in _lwp_start () from /usr/lib/libthread.so.1
#8 0xfe775378 in _lwp_start () from /usr/lib/libthread.so.1
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
I would like get some hints on how to use gdb to examine args of the getenv() call seen in the following stack trace.
The source for Solaris libc is available here.
You can examine argument to getenv by setting the breakpoint on it, and looking at the registers. You'll need to know the ABI that is used, but it's quite simple -- the argument to getenv is in register i0, and print (char*)$i0 at the (gdb) prompt should print "TZ".
Finally, the most likely reason for a crash in getenv is that you've corrupted the environment earlier. In particular, note that this code is bad:
void buggy()
{
char buf[80];
strcpy(buf, "FOO=BAR");
putenv(buf); // <-- BUG!
}
You could usually examine the environment via one of these commands:
(gdb) x/100s __environ
(gdb) x/100s environ
Chances are, you'll see strings there which do not contain the = sign.