If I'm understanding this right, each time you call a C++ function, SP (and possibly BP) get moved to allocate some temporary space on the stack — a stack frame. And when the function returns, the pointers get moved back, deallocating the stack frame again.
But it appears to me that the data on the old stack frame is still there, it's just not referenced any more. Is there some way to make GDB show me these deleted stack frames? (Obviously once you enter a new stack frame it will at least partially overwrite any previous ones... but until then it seems like it should be possible.)
But it appears to me that the data on the old stack frame is still there, it's just not referenced any more.
Correct.
Is there some way to make GDB show me these deleted stack frames?
You can trivially look at the unused stack with GDB examine command. For example:
void fn()
{
int x[100];
for (int j = 0; j < 100; j++) x[j] = (0x1234 << 12) + j;
}
int main()
{
fn();
return 0;
}
Build and debug with:
gcc -g t.c
gdb -q ./a.out
(gdb) start
Temporary breakpoint 1 at 0x115f: file t.c, line 10.
Starting program: /tmp/a.out
Temporary breakpoint 1, main () at t.c:10
10 fn();
(gdb) n
11 return 0;
(gdb) x/40x $rsp-0x40
0x7fffffffdc60: 0x0123405c 0x0123405d 0x0123405e 0x0123405f
0x7fffffffdc70: 0x01234060 0x01234061 0x01234062 0x01234063
0x7fffffffdc80: 0x55555170 0x00005555 0x55555040 0x00000064
0x7fffffffdc90: 0xffffdca0 0x00007fff 0x55555169 0x00005555
0x7fffffffdca0: 0x55555170 0x00005555 0xf7a3a52b 0x00007fff
0x7fffffffdcb0: 0x00000000 0x00000000 0xffffdd88 0x00007fff
0x7fffffffdcc0: 0x00080000 0x00000001 0x5555515b 0x00005555
0x7fffffffdcd0: 0x00000000 0x00000000 0xa91c6994 0xc8f4292d
0x7fffffffdce0: 0x55555040 0x00005555 0xffffdd80 0x00007fff
0x7fffffffdcf0: 0x00000000 0x00000000 0x00000000 0x00000000
Here you can clearly see x still on stack: 0x7fffffffdc60 is where x[92] used to be, 0x7fffffffdc70 is where x[96] used to be, etc.
There is no easy way to make GDB interpret that data as locals of fn though.
Stackframes do not contain any information about its size or boundaries, rather this knowledge is hardcoded into functions' code. There is a (stack) frame pointer register, using which makes it possible to walk the stack up, but not down. In the current function you know the boundaries of the current frame, but there is no information of what could possibly be below it.
Related
I am trying to understand possible sources for "stack smashing" errors in GCC, but not Clang.
Specifically, when I compile a piece of code with just debug symbols
set(CMAKE_CXX_FLAGS_DEBUG "-g")
and use the GCC C++ compiler (GNU 5.4.0), the application crashes with
*** stack smashing detected ***: ./testprogram terminated
Aborted (core dumped)
However, when I use Clang 3.8.0, the program completes without error.
My first thought was that perhaps the canaries of GCC are catching a buffer overrun that Clang isn't. So I added the additional debug flag
set(CMAKE_CXX_FLAGS_DEBUG "-g -fstack-protector-all")
But Clang still compiles a program that runs without errors. To me this suggests that the issue likely is not a buffer overrun (as you commonly see with stack smashing errors), but an allocation issue.
In any case, when I add in the ASAN flags:
set(CMAKE_CXX_FLAGS_DEBUG "-g -fsanitize=address")
Both compilers yield a program that crashes with an identical error. Specifically,
GCC 5.4.0:
==1143==ERROR: AddressSanitizer failed to allocate 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno: 12)
==1143==ReserveShadowMemoryRange failed while trying to map 0xdfff0001000 bytes. Perhaps you're using ulimit -v
Aborted (core dumped)
Clang 3.8.0:
==1387==ERROR: AddressSanitizer failed to allocate 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno: 12)
==1387==ReserveShadowMemoryRange failed while trying to map 0xdfff0001000 bytes. Perhaps you're using ulimit -v
Aborted (core dumped)
Can somebody give me some hints on the likely source of this error? I am having an awefully hard time tracing down the line where this is occurring, as it is in a very large code base.
EDIT
The issue is unresolved, but is isolated to the following function:
void get_sparsity(Data & data) {
T x[n_vars] = {};
T g[n_constraints] = {};
for (Index j = 0; j < n_vars; j++) {
const T x_j = x[j];
x[j] = NAN;
eval_g(n_vars, x, TRUE, n_constraints, g, &data);
x[j] = x_j;
std::vector<Index> nonzero_entries;
for (Index i = 0; i < n_constraints; i++) {
if (isnan(g[i])) {
data.flattened_nonzero_rows.push_back(i);
data.flattened_nonzero_cols.push_back(j);
nonzero_entries.push_back(i);
}
}
data.nonzeros.push_back(nonzero_entries);
}
int internal_debug_point = 5;
}
which is called like this:
get_sparsity(data);
int external_debug_point= 6;
However, when I put a debug point on the last line of the get_sparsity function, internal_debug_point = 5, it reaches that line without issue. However, when exiting the function, and before it hits the external debug point external_debug_point = 6, it crashes with the error
received signal SIGABRT, Aborted.
0x00007ffffe315428 in __GI_raise (sig=sig#entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
My guess is that GCC is only checking the canaries when exiting that function, and hence the error is actually occurring inside the function. Does that sound reasonable? If so, then is there a way to get GCC or clang to do more frequent canary checks?
I suspect ASan is running out of memory.
I don't think the ASan errors mean your program is trying to allocate that memory, it means ASan is trying to allocate it for itself (it says "shadow memory" which is what ASan uses to keep track of the memory your program allocates).
If the number of iterations (and size of array) n_vars is large, then the function will use extra memory for a new std::vector in every loop, forcing ASan to track more and more memory.
You could try moving the local vector out of the loop (which will likely increase the performance of the function anyway):
std::vector<Index> nonzero_entries;
for (Index j = 0; j < n_vars; j++) {
// ...
for (Index i = 0; i < n_constraints; i++) {
if (isnan(g[i])) {
data.flattened_nonzero_rows.push_back(i);
data.flattened_nonzero_cols.push_back(j);
nonzero_entries.push_back(i);
}
}
data.nonzeros.push_back(nonzero_entries);
nonzero_entries.clear();
}
This will reuse the same memory for nonzero_entries instead of allocating and deallcoating memory for a new vector every iteration.
Trying to figure out the source of the stack problems was getting nowhere. So I tried a different approach. Through debugging, I narrowed down the above function get_sparsity as the culprit. The debugger wasn't giving me any hints exactly WHERE the problem was occurring, but it was somewhere inside that function. With that information, I switched the only two stack variables in that function x and g to heap variables so that valgrind could help me find the error (sgcheck was coming up empty). Specifically, I modified the above code to
void get_sparsity(Data & data) {
std::vector<T> x(n_vars, 0);
std::vector<T> g(n_constraints, 0);
/* However, for our purposes, it is easier to make an std::vector of Eigen
* vectors, where the ith entry of "nonzero_entries" contains a vector of
* indices in g for which g(indices) are nonzero when perturbing x(i).
* If that sounds complicated, just look at the code and compare to
* the code where we use the sparsity structure.
*/
for (Index j = 0; j < n_vars; j++) {
const T x_j = x[j];
x[j] = NAN;
Bool val = eval_g(n_vars, x.data(), TRUE, n_constraints, g.data(), &data);
x[j] = x_j;
std::vector<Index> nonzero_entries;
for (Index i = 0; i < n_constraints; i++) {
if (isnan(g[i])) {
data.flattened_nonzero_rows.push_back(i);
data.flattened_nonzero_cols.push_back(j);
nonzero_entries.push_back(i);
}
}
data.nonzeros.push_back(nonzero_entries);
}
int bob = 5;
return;
}
and then valgrinded it to find the offending line. Now that I know where the problem is occurring, I can fix the problem.
I am working in a project where there are more than 100 files of source code. I am debugging it for find an error.
What I need to find is the time when a particular object assigned to a value. ie. This object is NULL at first, but some other file changes its value, which I don't know.
Are there any method to find when this variable changes its value ?
What I tried upto now is to put a breakpoint on the function where the variable is initilized. I also added a watchpoint. But it does not show any point where the value is changing.
But it does not show any point where the value is changing.
There are two possible explanations:
You've set the watchpoint incorrectly, or
The value changes while the process is in kernel mode (GDB watchpoints are not effective for detecting such change).
Example:
#include <unistd.h>
int global_a;
int global_b;
int foo()
{
global_a = 42;
read(0, &global_b, 1);
return global_a + global_b;
}
int main()
{
return foo();
}
gcc -g -Wall t.c
gdb -q ./a.out
(gdb) start
Temporary breakpoint 1 at 0x400563: file t.c, line 16.
Starting program: /tmp/a.out
Temporary breakpoint 1, main () at t.c:16
16 return foo();
(gdb) watch global_a
Hardware watchpoint 2: global_a
(gdb) watch global_b
Hardware watchpoint 3: global_b
(gdb) c
Continuing.
Hardware watchpoint 2: global_a
Old value = 0
New value = 42
This is a modification of global_a in user-space (via direct assignment), and it triggers watchpoint 2 as expected
foo () at t.c:9
9 read(0, &global_b, 1);
(gdb) c
Continuing.
# Press Enter here
This read 0xA == 10 into global_b.
[Inferior 1 (process 126196) exited with code 064]
Note that the exit code is 064 == 52 == 42+10, but the watchpoint 3 did not fire.
any method to find when this variable changes its value
If you are sure that your "normal" watchpoints are working (e.g. by running above test yourself) and suspect that the variable is being changed via a system call, you can:
Print the address of the variable and
Run your program under strace and look for system calls that could change the value at variable's address.
Using the same example:
(gdb) p &global_b
$1 = (int *) 0x601044 <global_b>
strace -e raw=all ./a.out < /dev/zero
execve(0x7ffe1853e530, 0x7ffe1853f920, 0x7ffe1853f930) = 0
brk(0) = 0x2253000
access(0x7fa66eee48c3, 0) = -1 (errno 2)
mmap(0, 0x2000, 0x3, 0x22, 0xffffffff, 0) = 0x7fa66f0e8000
...
munmap(0x7f57ece26000, 0x203a2) = 0
read(0, 0x601044, 0x1) = 0x1 ### &global_b is "covered" by the buffer being read into
exit_group(42) = ?
+++ exited with 42 +++
Program in C/C++ runs on embedded PowerPC under debugger with HW break points capabilities.
There is global variable 'char Name[256]' known in 2 files and 2 tasks correspondingly. One task reads Name, another fills it with a text, '1234567...', for example.
At some moment, global variable Name gets corrupted. When asked for the variable address gdb shows (and application prints by debug printouts) address equal to 0x31323334.
How to catch this bug with HW breakpoints? I mean at what address to put HWBP.
When I look into assembler, I see:
lis 9,Name#ha
lwz 9,Namel#l(9)
So, how memory corruption can change the code without influencing the application flow - it should crash immediately, no?
Thanks a lot ahead
0x31323334 is "1234" sans null terminator. Further, "Global variable address corruption" does not make much sense "global variables" (whose addresses do not change), nor really for an array of size 256 (unless you're using a pointer somewhere and it's the pointer which is being corrupted). So I suspect you might be unfamiliar with GDB.
When using GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1 on x86 (admittedly, not ppc, but basically the same software) with the following test file:
// g++ test.cpp -g
#include <iostream>
char Name[256] = "123456789";
int main() {
Name[0] = 'a';
std::cout << Name << std::endl;
}
I can get the following output from GDB:
(gdb) break main
Breakpoint 2 at 0x40086a: file test.cpp, line 6.
(gdb) r
Starting program: /home/keithb/dev/mytest/a.out
Breakpoint 2, main () at test.cpp:6
6 Name[0] = 'a';
(gdb) whatis Name
type = char [256]
(gdb) print Name
$1 = "123456789", '\000' <repeats 246 times>
(gdb) print &Name
$2 = (char (*)[256]) 0x6010c0 <Name>
In any case, if you really do want to set a "hardware breakpoint" (GDB calls those "watchpoints"), then you can do get the address of Name prior to corruption. Then just set the watchpoint and wait for your program to write to the address.
(gdb) c
Continuing.
a23456789
[Inferior 1 (process 21878) exited normally]
(gdb) delete 2
(gdb) watch *0x6010c0
Hardware watchpoint 3: *0x6010c0
(gdb) r
Starting program: /home/keithb/dev/mytest/a.out
Hardware watchpoint 3: *0x6010c0
Old value = 875770417
New value = 875770465
main () at test.cpp:7
7 std::cout << Name << std::endl;
(gdb)
I am currently debugging a crash in one of my applications and stumbled up on something that looks a bit odd to me.
In my application I start several threads using _beginthreadex function. At some random point in time (sometimes after several hours) one of the threads crashes for a yet unknown reason. The stack trace at the time of the crash looks like this:
my.dll!thread(void * p=0x241e0140) Line 792 + 0xf bytes C
msvcr90.dll!__endthreadex() + 0x44 bytes
msvcr90.dll!__endthreadex() + 0xd8 bytes
kernel32.dll!_BaseThreadStart#8() + 0x37 bytes
where thread is the main loop function of my thread. I omitted the stack frames above my thread function here. The crash is an access violation and happens each time at the same location with the exact same bad pointer value.
Now what made me think is that __endthreadex occurs in the stack trace. I tried to reproduce this in a small sample program and caused an access violation there too. However the stack trace looks different:
test.exe!thread(void * vpb=0x00000000) Line 9 + 0x3 bytes C++
test.exe!_callthreadstartex() Line 348 + 0x6 bytes C
test.exe!_threadstartex(void * ptd=0x000328e8) Line 326 + 0x5 bytes C
kernel32.dll!_BaseThreadStart#8() + 0x37 bytes
The corresponding code is this:
static unsigned __stdcall thread( void * vpb )
{
int *a = (int*)0xdeadbeef;
*a = 0;
return 1;
}
int main()
{
HANDLE th = (HANDLE)_beginthreadex( NULL, 0, thread, 0, CREATE_SUSPENDED, NULL );
for(int i=0; i<INT_MAX;)
{
++i;
}
ResumeThread(th);
for(int i=0; i<INT_MAX;)
{
++i;
}
return 0;
}
The original code looks similar to this but is obviously more complex. So I'm no wondering if the differences in the stack trace might be caused by my modifications to the code when implementing the small example or if all this shows that there is a problem with a corrupted stack.
So my question is basically: Is it possible that _BaseThreadStart calls __endthreadex which again calls the thread function itself? And if so, can someone explain why this is the case because to me it looks odd and I'm thinking if this points out some problem which might be related to my initial problem of the crashing thread.
Thanks in advance for your suggestions.
I have an object defined in c++ with a pointer to it used in various functions and files throughout the project. I am having an issue with the data being updated, so I want to debug it to see what is happening. Ideally, I want to break every time the object is accessed. however, watch requires a specific memory address. So, for example, if I have:
class data{
public:
int a;
int b;
};
then gdb will only break when a is altered, since the pointer to data is pointed at a, but not when b is altered.
Is there a way to break whenever the entire range of memory covered by the data class is altered?
Is there a way to break whenever the entire range of memory covered by the data class is altered?
Perhaps.
GDB hardware watchpoints use special debug registers in hardware, and there is usually a limit on how such registers work. On x86, you can set up to 4 word-sized hardware watch points, so for example you gave you can set watchpoints on &data->a and &data->b, and that will "cover" entire memory of the data.
I am guessing that your actual data has many more members though, and so 4 word-sized watch points will not suffice.
If you are on platform which has Valgrind support, and if your program can execute under Valgrind, then you can use Valgrind's built-in gdbserver to set watchpoints on arbitrary regions of memory.
Update:
I looked through the page you linked to and couldn't find what I was looking for
I am not sure what you were looking for. Here is a sample session showing how it works:
#include <stdlib.h>
void foo(char *p)
{
*p = 'a';
}
typedef struct {
char buf[1024];
} data;
int main()
{
data *d = calloc(1, sizeof(data));
foo(d->buf + 999);
}
gcc -g main.c
valgrind --vgdb-error=0 ./a.out
...
==10345== TO DEBUG THIS PROCESS USING GDB: start GDB like this
==10345== /path/to/gdb ./a.out
==10345== and then give GDB the following command
==10345== target remote | vgdb --pid=10345
... Valgrind now waits for debugger to attach.
In another window:
gdb ./a.out
GNU gdb (GDB) 7.4
...
(gdb) target remote | vgdb --pid=10345
relaying data between gdb and process 10345
[Switching to Thread 10345]
0x0000000004000af0 in _start () from /lib64/ld-linux-x86-64.so.2
(gdb) b main
Breakpoint 1 at 0x40053d: file main.c, line 14.
(gdb) c
Breakpoint 1, main () at main.c:14
14 data *d = calloc(1, sizeof(data));
(gdb) n
15 foo(d->buf + 999);
(gdb) watch *d
Hardware watchpoint 2: *d
Note that a "hardware" watchpoint has been set on entire *d.
It's a hardware watchpoint only in the sense that Valgrind is the hardware.
(gdb) p d.buf[999]
$1 = 0 '\000'
(gdb) c
Hardware watchpoint 2: *d
Old value = {buf = '\000' <repeats 1023 times>}
New value = {buf = '\000' <repeats 999 times>, "a", '\000' <repeats 23 times>}
foo (p=0x51b6457 "a") at main.c:6
6 }
(gdb) q
Voila: the debugger stopped when 999th element was modified, proving that the watchpoint "covered" the entire structure.