I don't know why I can't see this backtrace. The symbols from my own binary are loaded, and the package libc6-dbg is installed. Do I need to tell gdb where to find the libc symbols?
Program received signal SIGSEGV, Segmentation fault.
__memcpy_ia32 () at ../sysdeps/i386/i686/multiarch/../memcpy.S:74
74 ../sysdeps/i386/i686/multiarch/../memcpy.S: No such file or directory.
(gdb) bt full
#0 __memcpy_ia32 () at ../sysdeps/i386/i686/multiarch/../memcpy.S:74
No locals.
#1 0x00000000 in ?? ()
No symbol table info available.
(gdb)
From your backtrace, is possible that you've a stack corruption that is overwriting your return address (mainly because there's only two calls and no information about code calling memcpy is available). Is it possible that you're using memcpy over an address in the stack?
One way to check for this kind of corruptions is by using watch gdb command:
Most important part is delimit the call that should be corrupting. In your case should be a call to memcpy or close to it.
once you have a suspicious function, add a break point on it.
Run until break point is reached.
Set a watchpoint into calling function's address by: watch 0xXXXXXX
Run until watchpoint is reached.
If return address is overwritten, db should stop on corrupting call.
Related
I'm investigating a SIGSEGV using gdb. This is the last stack frame as seen by gdb attached to the running process about to crash:
#0 0x08d1805c in FooBar (this=0xa9315578, dt=0.100000001)
At that point, I've saved the state using generate-core-file. When I then inspect this dump with gdb, the same stack frame reads:
#0 0x08d1805c in FooBar (this=0x0, dt=2.69049305e-42)
This confuses me. On the one hand, the value 0.1 for dt in the live situation makes sense. On the other hand, this being 0x0, as seen in the dump, would nicely explain the SIGSEGV.
More importantly, how could there be a discrepancy at all?
I use gdb command as follows to localize the segmentation fault, but it shows ?? so that I am confused. What does it mean? How to avoid it?
$ gdb program core
...
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000048d0000048c in ?? ()
(gdb) bt
#0 0x0000046a00000469 in ?? ()
#1 0x0000046c0000046b in ?? ()
#2 0x0000046e0000046d in ?? ()
#3 0x000004700000046f in ?? ()
#4 0x0000047300000472 in ?? ()
#5 0x0000047600000475 in ?? ()
#6 0x0000047800000477 in ?? ()
#7 0x0000047a00000479 in ?? ()
#8 0x0000047d0000047b in ?? ()
...
I find that the array is out of bounds and I solved it. But I still confused with the phenomenon above.
0x0000048d0000048c
This looks like you've called a function through a function pointer, but that pointer has been overwritten with two integers: 0x48d == 1165 and 0x48c == 1164 (do these values look like something that your program is using?).
You should use bt to tell you how you got there.
You should probably use Valgrind or Address Sanitizer to check for uninitialized or dangling memory and buffer overflow (which are some of the common ways to end up with invalid function pointer).
Update:
Now that you show the stack trace, it's an almost 100% guarantee that you have some local array of integers which you've overflown (filling it with values like 1129, 1130, 1131, etc.), thus corrupting your stack.
Address Sanitizer (available in recent versions of GCC) should point you straight at where the bug is.
This means that your program crashed in a function unknow by gdb (function not provided by the symbol table)
try these two options, in the given order:
if you are debugging a target, be sure that all your code layers are compiled with the option -g if you are using gcc.
You can give manually the symbol table to gdb with the command file "binary_with_symbol_table" and it will give you the function and the address of the bug.
Note that many exceptions may be hidden behind a segmentation fault.
After moving to Centos 5 (from 4) I'm seeing gdb crashing when I try to call a member function on an std::vector:
(gdb) p actionQueue->size()
Program received signal SIGSEGV, Segmentation fault.
0x081d881e in indexedActionQueue::size (this=0xbfbd0050) at actionList.h:52
The program being debugged was signaled while in a function called from GDB.
GDB has restored the context to what it was before the call.
To change this behavior use "set unwindonsignal off".
Evaluation of the expression containing the function
(indexedActionQueue::size() const) will be abandoned.
(gdb) p actionQueue->_actionQueue
$1 = std::vector of length 7, capacity 7 = {
[...]
}
(gdb) p actionQueue->_actionQueue.empty()
Program received signal SIGSEGV, Segmentation fault.
0x08158ea3 in std::vector<CAction, std::allocator<CAction> >::empty (this=0xa514788) at /opt/rh/devtoolset-2/root/usr/include/c++/4.8.2/bits/stl_vector.h:735
The program being debugged was signaled while in a function called from GDB.
GDB has restored the context to what it was before the call.
To change this behavior use "set unwindonsignal off".
Evaluation of the expression containing the function
(std::vector<CAction, std::allocator<CAction> >::empty() const) will be abandoned.
(gdb) p actionQueue->_actionQueue.begin()
Program received signal SIGSEGV, Segmentation fault.
0x08188ec6 in std::vector<CAction, std::allocator<CAction> >::begin (this=0xa514788) at /opt/rh/devtoolset-2/root/usr/include/c++/4.8.2/bits/stl_vector.h:539
The program being debugged was signaled while in a function called from GDB.
GDB has restored the context to what it was before the call.
To change this behavior use "set unwindonsignal off".
Evaluation of the expression containing the function
(std::vector<CAction, std::allocator<CAction> >::begin()) will be abandoned.
I tried updating gdb to v7.9 but I got the same results. I'm using scl enable devtoolset-2 to build with gcc v4.8.2. If I print the std::vector in gdb it looks proper.
If I place the methods that get SEGV directly into the C++ application they run without SEGV, but manually running them in gdb gets a SEGV.
How can I get gdb to allow these calls to work?
My program recently crashed with the following stack;
Program terminated with signal 7, Bus error.
#0 0x00007f0f323beb55 in raise () from /lib64/libc.so.6
(gdb) bt
#0 0x00007f0f323beb55 in raise () from /lib64/libc.so.6
#1 0x00007f0f35f8042e in skgesigOSCrash () from /usr/lib/oracle/11.2/client64/lib/libclntsh.so.11.1
#2 0x00007f0f36222ca9 in kpeDbgSignalHandler () from /usr/lib/oracle/11.2/client64/lib/libclntsh.so.11.1
#3 0x00007f0f35f8063e in skgesig_sigactionHandler () from /usr/lib/oracle/11.2/client64/lib/libclntsh.so.11.1
#4 <signal handler called>
What should I check in my code to avoid this? Or is this something Oracle should fix?
Main reasons you could get a bus error revolves around inaccessible memory. This could be due to many reasons:
Accessing through a deleted pointer.
Accessing through an uninitialized pointer.
Accessing through a NULL pointer.
Accessing the address which is not yours. It could be due to overflow errors.
Try adding the following to the $ORACLE_HOME/network/admin/*.ora file:
DIAG_ADR_ENABLED=OFF
DIAG_SIGHANDLER_ENABLED=FALSE
DIAG_DDE_ENABLED=FALSE
This sounds like an Oracle issue.
And also Oracle's libraries seem to be compiled by Intel compilers.
Here is the Valgring report:
==14546== Thread 5:
==14546== Invalid free() / delete / delete[]
==14546== at 0x490555D: free (vg_replace_malloc.c:235)
==14546== by 0x3BF7EFAA8F: free_mem (in /lib64/tls/libc-2.3.4.so)
==14546== by 0x3BF7EFA581: __libc_freeres (in /lib64/tls/libc-2.3.4.so)
==14546== by 0x4802676: _vgw_freeres (vg_preloaded.c:62)
==14546== Address 0x4DC4EE0 is not stack'd, malloc'd or (recently) free'd
How can I know which thread is it as the thread number varies from one execution to another ? Will assigning names to my threads help here ?
EDIT: I don't think it will as this is mentioned in the DRD section of the manual.
I'm using valgrind-3.1.1 on Red Hat enterprise Linux AS4.
You are likely freeing a global variable (the address: 0x4DC4EE0 is very close to where globals live by default on Linux/x86_64).
Run the program under GDB, then do info symbol 0x4DC4EE0, and GDB should tell you all you need to know.
Update:Valgrind 3.6 actually reports the global symbol already. For example, given this buggy program:
#include <stdlib.h>
int x;
int main()
{
free(&x);
return 0;
}
Valgrind 3.6 reports:
==18731== Invalid free() / delete / delete[]
==18731== at 0x4C240E8: free /tmp/vg/coregrind/m_replacemalloc/vg_replace_malloc.c:394
==18731== by 0x4004AA: main /home/t.c:7
==18731== Address 0x60089c is 0 bytes inside data symbol "x"
I finally found the explanation for this: my unit-test executable was linked to a [third party] library it didn't use. I re-linked it without that library and the problem went away.
Also the error was detected in __libc_freeres(), a function of the gnu libc that free resources at the end of the execution. The problem might lie in the library or in the glibc.
The following Valgrind Linux-specific option can be used to avoid this error: --run-libc-freeres=no. Notice this can make the leak detection less efficient.
You can use the macro DRD_GET_DRD_THREADID to display the thread IDs when the thread starts. You can also give a name in the print to help. See the DRD Manual
EDIT Maybe I'm not specific here.. but I think you'll need to link in some valgrind libs when you build a debug version of your code (maybe with a compile option or something). You can use the DRD_GET_DRD_THREADID from within the thread and get a name you assigned when it starts - then you can write that info to a file or to the console. There's no way to tell DRD to print the name I don't think, so you have to use a combo.