Hardware breakpoint on C++ reference modification in gdb - c++

C++ standard says that it is unspecified whether or not a reference requires storage (3.7).. However, as far as I understand, gcc implements C++ references as pointers and as such they can be corrupted.
Is it possible to get an address of a reference in gdb and put a hardware breakpoint on that address in order to find out what corrupts the memory where the reference resides? How can one set such a breakpoint?

GDB may does hardware watchpointing. You can use command watch for this. Example:
main.cpp:
int main(int argc, char **argv)
{
int a = 0;
int& b = a;
int* c = &a;
*c = 1;
return 0;
}
Start debugging and set breakpoint on start main function and end main function:
(gdb) b main
Breakpoint 1 at 0x401bc8: file /../main.cpp, line 60.
(gdb) b main.cpp:65
Breakpoint 2 at 0x401be9: file /../main.cpp, line 65.
(gdb) r
Get address of reference b:
Breakpoint 1, main (argc=1, argv=0x7fffffffddd8) at /../main.cpp:60
60 int a = 0;
(gdb) disas /m
Dump of assembler code for function main(int, char**):
59 {
... Something code
60 int a = 0;
=> 0x0000000000401bc8 <+11>: movl $0x0,-0x14(%rbp)
61 int& b = a;
0x0000000000401bcf <+18>: lea -0x14(%rbp),%rax
0x0000000000401bd3 <+22>: mov %rax,-0x10(%rbp)
62 int* c = &a;
0x0000000000401bd7 <+26>: lea -0x14(%rbp),%rax
0x0000000000401bdb <+30>: mov %rax,-0x8(%rbp)
63 *c = 1;
0x0000000000401bdf <+34>: mov -0x8(%rbp),%rax
0x0000000000401be3 <+38>: movl $0x1,(%rax)
64
65 return 0;
0x0000000000401be9 <+44>: mov $0x0,%eax
66 }
0x0000000000401bee <+49>: pop %rbp
0x0000000000401bef <+50>: retq
End of assembler dump.
(gdb) p $rbp-0x10
$1 = (void *) 0x7fffffffdce0
p $rbp-0x10 is printing address of reference b. It is 0x7fffffffdce0.
Set this address for watching:
(gdb) watch *0x7fffffffdce0
Hardware watchpoint 3: *0x7fffffffdce0
(gdb) c
GDB break only if value is changed:
(gdb) c
Continuing.
Hardware watchpoint 3: *0x7fffffffdce0
Old value = -8752
New value = -8996
main (argc=1, argv=0x7fffffffddd8) at /../main.cpp:62
62 int* c = &a;
Sorry for my english!

Related

Variable changes during the execution. C++

The program crashes with a segmentation fault, so I ran it with gdb. I found out that a member of a class is overridden during the runtime. Few facts:
the class is not used in the function where it is overridden.
the piece of the code where it is overridden is in omp critical section.
I checked the line where the debugger stops - the variables that are accessed in this line have different memory address.
Piece of gdb code:
(gdb) p SrcVelField
$1 = (SNGM::FieldVector *) 0x555555a237a0
(gdb) p *SrcVelField
$2 = {MeshRef = 0x55555592e530, Values = {data_ = 0x555555a237f0}, NbVar = 3,
NbVal = 8820, VarName = {data_ = 0x55555592f000},
FieldName = "VelocityField"}
(gdb) p (*SrcVelField).NbVar
$3 = 3
(gdb) p &(*SrcVelField).NbVar
$4 = (unsigned int *) 0x555555a237b0
(gdb) watch -l *0x555555a237b0
Hardware watchpoint 2: -location *0x555555a237b0
(gdb) cont
Thread 1 "SNGM" hit Hardware watchpoint 2: -location *0x555555a237b0
Old value = 3
New value = -118991381
SNGM::GlobalEval<SNGM::FilterGaussian, 3u, 1u>::PerformGlobalEval ()
at ../SNGM/Numerics/GlobalEval.cpp:288
288 for (unsigned ii = 0; ii < nx; ii++){
(gdb) p &ii
$5 = (unsigned int *) 0x7fffffffc93c
(gdb) p &nx
$6 = (const unsigned int *) 0x7fffffffc950
I will be glad for hints why the value of (*SrcVelField).NbVar is changing there.
UPDATE
Disassembling the code in this line gives next:
- problematic rax value
(gdb) info registers
rax 0x555555a237b0 93824997275568
rbx 0x61a8 25000
And instructions to reach it
280 delete[] point;
0x0000555555595bcc <+1113>: cmpq $0x0,-0x58(%rbp)
0x0000555555595bd1 <+1118>: je 0x555555595adc
<PerformGlobalEvalEv._omp_fn.0(void)+873>
0x0000555555595bd7 <+1124>: mov -0x58(%rbp),%rax
0x0000555555595bdb <+1128>: mov %rax,%rdi
0x0000555555595bde <+1131>: callq 0x555555564758
0x0000555555595be3 <+1136>: jmpq 0x555555595adc
<PerformGlobalEvalEv._omp_fn.0(void)+873>
281 } //End for all particles
282
283 #pragma omp critical
0x0000555555595f5b <+2024>: callq 0x555555564ae8
0x0000555555595f78 <+2053>: callq 0x555555564a40
284 {
285
286 for (unsigned kk = 0; kk < nz; kk++){
0x0000555555595f60 <+2029>: movl $0x0,-0x15c(%rbp)
0x0000555555595f6a <+2039>: mov -0x15c(%rbp),%eax
0x0000555555595f70 <+2045>: cmp -0x148(%rbp),%eax
0x0000555555595f76 <+2051>: jb 0x555555595fed
<PerformGlobalEvalEv._omp_fn.0(void)+2170>
0x0000555555596005 <+2194>: addl $0x1,-0x15c(%rbp)
0x000055555559600c <+2201>: jmpq 0x555555595f6a
<PerformGlobalEvalEv._omp_fn.0(void)+2039>
287 for (unsigned jj = 0; jj < ny; jj++){
0x0000555555595fed <+2170>: movl $0x0,-0x158(%rbp)
0x0000555555595ff7 <+2180>: mov -0x158(%rbp),%eax
0x0000555555595ffd <+2186>: cmp -0x144(%rbp),%eax
0x0000555555596003 <+2192>: jb 0x555555596011
<PerformGlobalEvalEv._omp_fn.0(void)+2206>
0x0000555555596029 <+2230>: addl $0x1,-0x158(%rbp)
0x0000555555596030 <+2237>: jmp 0x555555595ff7
<PerformGlobalEvalEv._omp_fn.0(void)+2180>
288 for (unsigned ii = 0; ii < nx; ii++){
0x0000555555596011 <+2206>: movl $0x0,-0x154(%rbp)
0x000055555559601b <+2216>: mov -0x154(%rbp),%eax
0x0000555555596021 <+2222>: cmp -0x140(%rbp),%eax
0x0000555555596027 <+2228>: jb 0x555555596032
<PerformGlobalEvalEv._omp_fn.0(void)+2239>
=> 0x0000555555596177 <+2564>: addl $0x1,-0x154(%rbp)
This is common behaviour when running out of stack/heap size or when an array is indexed out of bounds. Check your stack and heap sizes and verify that you are not exceeding them. Then check all instances of arrays/lists and verify that they are protected against out-of-bounds errors.

Which one is better to use in C and C++ programming?

Is there any difference between the following two code snippets? Which one is better to use? Is one of them faster?
case 1:
int f(int x)
{
int a;
if(x)
a = 42;
else
a = 0;
return a;
}
case 2:
int f(int x)
{
int a;
if(x)
a = 42;
return a;
}
Actually that both snippets can return totally different results, so there is no better...
In case 2 you can return a non initialized variable a, which may result on a garbage value other than zero...
if you mean this:
int f(int x)
{
int a = 0;
if(x)
a = 42;
return a;
}
then I would say is that better, since is more compact(but you are saving only an else, not much computational wasted anyways)
The question is not "which one is better". The question is "will both work?"
And the answer is no, they will not both work. One is correct, the other is out of the question. So, performance is not even an issue.
The following results in a having either an "indeterminate value" or an "unspecified value" mentioned in the c99 standard, sections 3.17.2 and 3.17.3 (Probably the latter, though it is not clear to me.)
int a;
if(x)
a = 42;
return a;
This in turn means that the function will return an unspecified value. This means that that are absolutely no guarantees as to what value you will get.
If you are unlucky, you might get zero, and thus proceed to use the above terrible piece of code without knowing that you are bound to have lots of trouble with it later.
If you are lucky, you will get something like 0x719Ab32d right away, so you will immediately know that you messed up.
Any decent C compiler will give you a warning if you try to compile this, so the fact that you are asking this question means that you do not have a sufficient number of warnings enabled. Do not try to write C code (or any code) without the maximum possible number of warnings enabled; it never leads to any good. Find out how to enable warnings on your C compiler, and enable as many of them as you can.
Note: I assume uninitialized a in your second snippet is a type and it is int a = 0.
We can use gdb to check the difference:
(gdb) list f1
19 {
20 int a;
21 if (x)
22 a = 42;
23 else
24 a = 0;
25 return a;
26 }
(gdb) list f2
28 int f2(int x)
29 {
30 int a = 0;
31 if (x)
32 a = 42;
33 return a;
34 }
Now let's look at the assembler code with -O3:
(gdb) disassemble f1
Dump of assembler code for function f1:
0x00000000004007a0 <+0>: cmp $0x1,%edi
0x00000000004007a3 <+3>: sbb %eax,%eax
0x00000000004007a5 <+5>: not %eax
0x00000000004007a7 <+7>: and $0x2a,%eax
0x00000000004007aa <+10>: retq
End of assembler dump.
(gdb) disassemble f2
Dump of assembler code for function f2:
0x00000000004007b0 <+0>: cmp $0x1,%edi
0x00000000004007b3 <+3>: sbb %eax,%eax
0x00000000004007b5 <+5>: not %eax
0x00000000004007b7 <+7>: and $0x2a,%eax
0x00000000004007ba <+10>: retq
End of assembler dump.
As you can see, there is no difference. Let us disable the optimizations with -O0:
(gdb) disassemble f1
Dump of assembler code for function f1:
0x00000000004006cd <+0>: push %rbp
0x00000000004006ce <+1>: mov %rsp,%rbp
0x00000000004006d1 <+4>: mov %edi,-0x14(%rbp)
0x00000000004006d4 <+7>: cmpl $0x0,-0x14(%rbp)
0x00000000004006d8 <+11>: je 0x4006e3 <f1+22>
0x00000000004006da <+13>: movl $0x2a,-0x4(%rbp)
0x00000000004006e1 <+20>: jmp 0x4006ea <f1+29>
0x00000000004006e3 <+22>: movl $0x0,-0x4(%rbp)
0x00000000004006ea <+29>: mov -0x4(%rbp),%eax
0x00000000004006ed <+32>: pop %rbp
0x00000000004006ee <+33>: retq
End of assembler dump.
(gdb) disassemble f2
Dump of assembler code for function f2:
0x00000000004006ef <+0>: push %rbp
0x00000000004006f0 <+1>: mov %rsp,%rbp
0x00000000004006f3 <+4>: mov %edi,-0x14(%rbp)
0x00000000004006f6 <+7>: movl $0x0,-0x4(%rbp)
0x00000000004006fd <+14>: cmpl $0x0,-0x14(%rbp)
0x0000000000400701 <+18>: je 0x40070a <f2+27>
0x0000000000400703 <+20>: movl $0x2a,-0x4(%rbp)
0x000000000040070a <+27>: mov -0x4(%rbp),%eax
0x000000000040070d <+30>: pop %rbp
0x000000000040070e <+31>: retq
End of assembler dump.
Now there is a difference and the first version in average for random arguments x will be faster as it has one mov less that the second one.
In case your second code is
int f(int x)
{
int a=0;
if(x)
a = 42;
return a;
}
and not
int f(int x)
{
int a;
if(x)
a = 42;
return a;
}
It doesn't matter.The compiler will convert them to same optimized code
I would prefer this (your second snippet):
int f(int x) {
int a = 0;
if (x) {
a = 42;
}
return a;
}
Everything should always have braces. Even if now I only have one line in the if block, I made add more later.
I don't put the braces on their own lines because it's pointless waste of space.
I rarely put the block on the same line as the conditional for readability.
You don't need extra space for a in either case - you can do something like this -
int f(int x)
{
if(x)
return 42;
else
return 0;
}
BTW in your second function you have not initialised a.

GDB: Print the value of memory address

According to https://www.ethicalhacker.net/columns/heffner/intro-to-assembly-and-reverse-engineering
mov 0xffffffb4,0x1
moves the number 1 into 0xffffffb4.
So, I decided to test this on my own.
In GDB, x is the command to print the value of memory address.
However, when I run
x 0x00000000004004fc
I'm not getting the value of 133 (decimal) or 85 (hexadecimal)
Instead, I'm getting 0x85f445c7. Any idea what is this?
me#box:~/c$ gdb -q test
Reading symbols from test...done.
(gdb) l
1 #include <stdio.h>
2
3 int main(){
4 int a = 1;
5 int b = 13;
6 int c = 133;
7 printf("Value of C : %d\n",c);
8 return 0;
9 }
(gdb) b 7
Breakpoint 1 at 0x400503: file test.c, line 7.
(gdb) r
Starting program: /home/me/c/test
Breakpoint 1, main () at test.c:7
7 printf("Value of C : %d\n",c);
(gdb)
Disassemble
(gdb) disas
Dump of assembler code for function main:
0x00000000004004e6 <+0>: push %rbp
0x00000000004004e7 <+1>: mov %rsp,%rbp
0x00000000004004ea <+4>: sub $0x10,%rsp
0x00000000004004ee <+8>: movl $0x1,-0x4(%rbp)
0x00000000004004f5 <+15>: movl $0xd,-0x8(%rbp)
0x00000000004004fc <+22>: movl $0x85,-0xc(%rbp)
=> 0x0000000000400503 <+29>: mov -0xc(%rbp),%eax
0x0000000000400506 <+32>: mov %eax,%esi
0x0000000000400508 <+34>: mov $0x4005a4,%edi
0x000000000040050d <+39>: mov $0x0,%eax
0x0000000000400512 <+44>: callq 0x4003c0 <printf#plt>
0x0000000000400517 <+49>: mov $0x0,%eax
0x000000000040051c <+54>: leaveq
0x000000000040051d <+55>: retq
End of assembler dump.
(gdb) x 0x00000000004004fc
0x4004fc <main+22>: 0x85f445c7
(gdb)
;DRTL
To print a value in GDB use print or (p in short form) command.
in your command
x 0x00000000004004fc
You have missed p command. You have to use x with p command pair to print value as hexadecimal format, like below:
(gdb) p/x 0x00000000004004fc
If the memory address is some pointer to some structure then you have to cast the memory location before using the pointer. For example,
struct node {
int data;
struct node *next
};
is some structure and you have the address of that structure pointer, then to view the contents of that memory location you have to use
(gdb) p *(struct node *) 0x00000000004004fc
Notable:
The command
x 0x00000000004004fc
Will look at the instruction and related data for this instruction:
0x00000000004004fc <+22>: movl $0x85,-0xc(%rbp)
... as you can see that the left column (address) is equal to the value used for the command (the address to read)
In the instruction 0x85 is clearly the destination address for the mov, and reflected in the printed value; 0x85f445c7 - which stored as MSB (most significant byte) at the address.

Stack content for g++4.8

I have written this simple piece of code for testing buffer overflow:
#include <stdio.h>
#include <string.h>
using namespace std;
int f(int x, int y, char *s){
char buf[4];
strcpy(buf,s);
return 0;
}
int main(int argc, char** argv){
f(2,3,argv[1]);
return 0;
}
Then compiling and viewing its execution with gdb (g++ 4.8.4)
g++ -g -fno-stack-protector -o bo bo.c
gdb bo
...
b f
r "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
p $rbp // 0x7fffffffdc90
p $rsp // 0x7fffffffdc70
x/20xw $rsp
0x7fffffffdc70: 0xffffe0ef 0x00007fff 0x00000003 0x00000002
0x7fffffffdc80: 0xffffdcb0 0x00007fff 0x00000000 0x00000000
0x7fffffffdc90: 0xffffdcb0 0x00007fff 0x00400585 0x00000000
0x7fffffffdca0: 0xffffdd98 0x00007fff 0x00000000 0x00000002
0x7fffffffdcb0: 0x00000000 0x00000000 0xf7a36ec5 0x00007fff
My understanding is that the stack grows downward to lower addresses, but it looks this stack frame (from 0x7fffffffdc90 - 0x7fffffffdc90) is growing upward: the parameters are pushed upward (s, y then x). Why is that?
Looks like the return address (0x00400585) is pushed first. But what are the meanings of subsequent words? Are they:
Saved $rbp$?
What are the next 2 words?
To see what happens to your stack after the call of f, call disassembler in gdb:
(gdb) disas
Dump of assembler code for function f(int, int, char*):
0x000000000040052d <+0>: push %rbp
0x000000000040052e <+1>: mov %rsp,%rbp
0x0000000000400531 <+4>: sub $0x20,%rsp
0x0000000000400535 <+8>: mov %edi,-0x14(%rbp)
0x0000000000400538 <+11>: mov %esi,-0x18(%rbp)
0x000000000040053b <+14>: mov %rdx,-0x20(%rbp)
0x000000000040053f <+18>: mov -0x20(%rbp),%rdx
0x0000000000400543 <+22>: lea -0x10(%rbp),%rax
0x0000000000400547 <+26>: mov %rdx,%rsi
0x000000000040054a <+29>: mov %rax,%rdi
=> 0x000000000040054d <+32>: callq 0x400410 <strcpy#plt>
0x0000000000400552 <+37>: mov $0x0,%eax
0x0000000000400557 <+42>: leaveq
0x0000000000400558 <+43>: retq
Before the call to strcpy the stack looks like (I use 64bit formating rather than 32bit):
(gdb) x/6xg $rsp
0x7fffffffddb0: 0x00007fffffffe297 0x0000000200000003
0x7fffffffddc0: 0x00007fffffffddf0 0x0000000000000000
0x7fffffffddd0: 0x00007fffffffddf0 0x0000000000400585
So you can see:
0x0000000000400585 - return address of the function f.
right next to it 0x00007fffffffddf0 - pushed on the stack by 0x000000000040052d <+0>: push %rbp
the next 4 values were reserved on stack via
0x0000000000400531 <+4>: sub $0x20,%rsp
you can see parameters 2 and 3 being saved on the stack prior to the call of the strcpy (0x0000000200000003- because ints are only 4 byte long).
You can also deduce other values on the stack from the disassembly.
The top of the stack is at the beginning (address 0x7fffffffddb0) and the addresses get bigger (e.g. 0x7fffffffddd0 for the third line) so you can see the stack really grows downwards but is shown upside down by gdb.

How to disassemble elf stripped file in gdb?

How to disassemble file after use strip command in gdb?
You can use GDB x/i command, e.g.
(gdb) x/4i 0x400390
0x400390: xor %ebp,%ebp
0x400392: mov %rdx,%r9
0x400395: pop %rsi
0x400396: mov %rsp,%rdx
But what you are probably looking for is objdump -d a.out
You can also use the disassemble command. It works like x /i , but it has the optional r nd m flags which, respectively, show you the raw encoding of the instructions and the source code line number correspondance.
With disassemble /rm:
(gdb) p free
$1 = {void (void *)} 0x7ffff7df0980 <free>
(gdb) disassemble /rm free,+13
Dump of assembler code from 0x7ffff7df0980 to 0x7ffff7df098d:
121 in dl-minimal.c
0x00007ffff7df0987 <free+7>: 53 push %rbx
0x00007ffff7df0988 <free+8>: 48 89 fb mov %rdi,%rbx
122 in dl-minimal.c
123 in dl-minimal.c
0x00007ffff7df0980 <free+0>: 48 3b 3d 49 d8 20 00 cmp 0x20d849(%rip),%rdi # 0x7ffff7ffe1d0 <alloc_last_block>
0x00007ffff7df098b <free+11>: 74 03 je 0x7ffff7df0990 <free+16>
End of assembler dump
With x /i:
(gdb) p free
$3 = {void (void *)} 0x7ffff7df0980 <free>
(gdb) x /4i free
0x7ffff7df0980 <free>: cmp 0x20d849(%rip),%rdi # 0x7ffff7ffe1d0 <alloc_last_block>
0x7ffff7df0987 <free+7>: push %rbx
0x7ffff7df0988 <free+8>: mov %rdi,%rbx
0x7ffff7df098b <free+11>: je 0x7ffff7df0990 <free+16>
The advantage (depending on your needs) of x /i over disassemble though, is that x /i accepts a size in instructions whereas disassemble takes a size in bytes.