I am looking into gdb, for breakpoint implementation. For ease I took the very first GDB release 2.51 (released 1988). I see the break point insert data as -
#define BREAKPOINT {0x4e, 0x4f}
what is 0x4e and 0x4f ?
The 0x4E 0x4F is the Motorola 68000 machine instruction for "TRAP #15". The TRAP instruction forces an exception to occur, and on certain platforms trap #15 is defined as a breakpoint exception. This is why 0x4E and 0x4F appear in your GDB 2.51 source for handling platforms such as sun3.
References:
This Motorola M68000 Family Programmer's Reference Manual contains the details of the TRAP instruction on page 4-188. Specifically, the instruction is represented by the 12-bit value 010011100100 followed by the 4-bit "vector" (in this case, 1111). So "TRAP #15" is represented by 0x4E 0x4F. Sun specifically uses vector 15 for breakpoint/tracing -- a Google search reveals numerous comments and source code examples.
Related
I've seen from a Numberphile video (https://youtu.be/1S0aBV-Waeo) a way to run a buffer overflow, and I wanted to try it out.
I have written a piece of code, which is identical to the one shown in the video except for the size of "buffer", but, if I give in input a string bigger than the size of "buffer", I am not getting a segmentation fault, as it was shown in the video; can someone explain why?
#include <stdio.h>
#include <string.h>
int main(int argc, char** argv){
char buffer[50];
strcpy(buffer, argv[1]);
return 0;
}
Edit:
By the way, as I've seen in the comments that this is a determinating thing, I am using th GCC compiler.
I am not getting a segmentation fault, as it was shown in the video; can someone explain why?
The program has undefined behavior as you're inputting a string bigger than the size of buffer and from strcpy documentation:
To avoid overflows, the size of the array pointed by destination shall be long enough to contain the same C string as source (including the terminating null character), and should not overlap in memory with source.
(emphasis mine)
Undefined behavior means anything1 can happen including but not limited to the program giving your expected output. But never rely(or make conclusions based) on the output of a program that has undefined behavior. The program may just crash.
So the output that you're seeing(maybe seeing) is a result of undefined behavior. And as I said don't rely on the output of a program that has UB. The program may just crash.
So the first step to make the program correct would be to remove UB. Then and only then you can start reasoning about the output of the program.
1For a more technically accurate definition of undefined behavior see this where it is mentioned that: there are no restrictions on the behavior of the program.
If I am correct that you wanted to understand what happened in your specific case, you could improve your question by providing the version of the compiler, the arguments you passed to the compiler, the arguments you passed to your program, and the output of your program. That way, you would have a Minimal Reproducible Example and we would understand better what your specific case is.
For example, I use GCC 9.4.0:
$ gcc --version
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Here is what happened when I compiled without optimization and passed a string with 55 characters as an argument to the program:
$ gcc -o bufferoverflow bufferoverflow.c
$ ./bufferoverflow 1234567890123456789012345678901234567890123456789012345
$
So, even though the number of bytes copied into the buffer, 56 including the terminator, should cause a write past the end of the buffer, the program ran without any error that is visible by simply looking at standard error or standard output.
Here is what happened when I ran the same executable but passed a 57 character string in the command line.
$ ./bufferoverflow 123456789012345678901234567890123456789012345678901234567
*** stack smashing detected ***: terminated
Aborted (core dumped)
$
One way to understand what happened in the case with the 55 character string is to run it again using using gdb, which can be started as shown:
$ gdb bufferoverflow
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from bufferoverflow...
(No debugging symbols found in bufferoverflow)
(gdb)
Now lets see why passing a 55 character string as the first argument didn't result in an obvious failure:
(gdb) break main
Breakpoint 1 at 0x1169
(gdb) r 1234567890123456789012345678901234567890123456789012345
Starting program: /home/tim/bufferoverflow 1234567890123456789012345678901234567890123456789012345
Breakpoint 1, 0x0000555555555169 in main ()
(gdb) x/23i main
=> 0x555555555169 <main>: endbr64
0x55555555516d <main+4>: push %rbp
0x55555555516e <main+5>: mov %rsp,%rbp
0x555555555171 <main+8>: sub $0x50,%rsp
0x555555555175 <main+12>: mov %edi,-0x44(%rbp)
0x555555555178 <main+15>: mov %rsi,-0x50(%rbp)
0x55555555517c <main+19>: mov %fs:0x28,%rax
0x555555555185 <main+28>: mov %rax,-0x8(%rbp)
0x555555555189 <main+32>: xor %eax,%eax
0x55555555518b <main+34>: mov -0x50(%rbp),%rax
0x55555555518f <main+38>: add $0x8,%rax
0x555555555193 <main+42>: mov (%rax),%rdx
0x555555555196 <main+45>: lea -0x40(%rbp),%rax
0x55555555519a <main+49>: mov %rdx,%rsi
0x55555555519d <main+52>: mov %rax,%rdi
0x5555555551a0 <main+55>: callq 0x555555555060 <strcpy#plt>
0x5555555551a5 <main+60>: mov $0x0,%eax
0x5555555551aa <main+65>: mov -0x8(%rbp),%rcx
0x5555555551ae <main+69>: xor %fs:0x28,%rcx
0x5555555551b7 <main+78>: je 0x5555555551be <main+85>
0x5555555551b9 <main+80>: callq 0x555555555070 <__stack_chk_fail#plt>
0x5555555551be <main+85>: leaveq
0x5555555551bf <main+86>: retq
From the above disassembly we can see that main+60 is just after the call to strcpy. We can also see, by looking at main+45 and main+52 that the buffer is at %rbp-0x40. We can continue to that point and look at what happened to the buffer:
(gdb) b *(main+60)
Breakpoint 2 at 0x5555555551a5
(gdb) c
Continuing.
Breakpoint 2, 0x00005555555551a5 in main ()
(gdb) x/56bx $rbp-0x40
0x7fffffffdf90: 0x31 0x32 0x33 0x34 0x35 0x36 0x37 0x38
0x7fffffffdf98: 0x39 0x30 0x31 0x32 0x33 0x34 0x35 0x36
0x7fffffffdfa0: 0x37 0x38 0x39 0x30 0x31 0x32 0x33 0x34
0x7fffffffdfa8: 0x35 0x36 0x37 0x38 0x39 0x30 0x31 0x32
0x7fffffffdfb0: 0x33 0x34 0x35 0x36 0x37 0x38 0x39 0x30
0x7fffffffdfb8: 0x31 0x32 0x33 0x34 0x35 0x36 0x37 0x38
0x7fffffffdfc0: 0x39 0x30 0x31 0x32 0x33 0x34 0x35 0x00
So we can see that, in spite of the fact that when we ran with this string earlier without gdb we didn't notice any obvious error, in fact the buffer overflow did occur. We simply didn't notice that it had. To understand why we didn't notice, one only has to look at the disassembly to see that the next used address on the stack is at %rbp-8 which is 56 bytes after %rbp-0x40. So the overflow went onto memory that was not in use.
The same disassembly shows why we get the stack smashing detected message when we run the program with the 57 character string. In that case, we clobber part of the 8-byte value at %rbp-8 which is used (at main+19, main+28, main+65, main+69 and main+78) as a check for whether the stack got corrupted during the call to main. So the reason we see that particular error with that particular input is that the 8-byte value at %rbp-8 was the only part of the stack that we clobbered that was actually used after we clobbered it and the message in question was as a result of noticing that those 8 bytes had changed.
Even if you did not compile your program exactly the way I did, and even if you did not use exactly the same input, I hope I have given you some solid ideas about how to understand the behavior in your case.
In Borland, there is a macro __emit__, "a pseudo-function that injects literal values directly into the object code" (James Holderness).
Is there an equivalent for gcc / g++?
(I can't seem to find one in the documentation)
If not, how could I implement it in my C++ source code?
Usage can be found at Metamorphic Code Examples
You can take a look at .byte assembler directive:
asm __volatile__ (".byte 0xEA, 0x00, 0x00, 0xFF, 0xFF");
GCC's optimizers sometimes discard asm statements if they determine there is no need for the output variables. Also, the optimizers may move code out of loops if they believe that the code will always return the same result (i.e. none of its input values change between calls). Using the volatile qualifier disables these optimizations.
Anyway you should pay attention to many corner cases (e.g. gcc skips asm code after goto...)
I cannot set 4 byte read / write access hardware breakpoint using windbg.
0:000> dd 02e80dcf
02e80dcf 13121110 17161514 1a191800 1e1d1c1b
02e80ddf 011c171f c7be7df1 00000066 4e454900
Actually I have to check when the value 0x13121110 (at address 0x02e80dcf)is getting changed/overwritten by the program.
So When I'm trying to set a 4 byte write access hardware breakpoint # 0x02e80dcf, I'm getting Data breakpoint must be aligned Error.
0:000> ba w 4 02e80dcf
Data breakpoint must be aligned
^ Syntax error in 'ba w 4 02e80dcf'
0:000> ba r 4 02e80dcf
Data breakpoint must be aligned
^ Syntax error in 'ba r 4 02e80dcf'
0:000> ba w 1 02e80dcf
breakpoint 0 redefined
I'm able to set 1 byte write access breakpoint at the address, But it not getting triggered when the pointer # address 0x02e80dcf is getting overwritten.
And also if anyone could suggest any other way to detect the address overwritten thing would be really helpful.
Note : The problem I'm facing for a particular program. I'm able to set 4 byte hardware break point in the same debugging environment.
As a side note, this particular behavior is from the CPU architecture itself (not from the system or the debugger).
x86 and x86-64 (IA32 and IA32-e in Intel lingo) architecture use Drx (Debug Registers) to handle hardware breakpoints.
Dr7 LENn field will set the length of a breakpoint and Dr0 to Dr3 will hold the breakpoint addresses.
from Intel Manual 3B - Chapter 18.2.5. "Breakpoint Field Recognition":
The LENn fields permit specification of a 1-, 2-, 4-, or 8-byte range,
beginning at the linear address specified in the corresponding debug
register (DRn).
In the same chapter it is explicitly stated:
Two-byte ranges must be aligned on word boundaries; 4-byte ranges must
be aligned on doubleword boundaries.
If you cover the desired address with a data breakpoint with a big enough length, then it will trap (breakpoint will be hit):
A data breakpoint for reading or writing data is triggered if any of
the bytes participating in an access is within the range defined by a
breakpoint address register and its LENn field.
The manual then goes on giving a tip to trap on unaligned address and gives an example table:
A data breakpoint for an unaligned operand can be constructed using
two breakpoints, where each breakpoint is byte-aligned and the two
breakpoints together cover the operand.
Addresses must be aligned on a 4-byte boundary (or larger for 64-bit systems).
Any hex address ending in 0xf is not aligned to a 4-byte boundary.
There may be a restriction by WinDbg that data breakpoints are aligned to 4 or 8 byte boundaries. You many need to use conditional break so that only the one byte is checked.
I have a program with an inner loop that needs to be very very fast due to the number of iterations it performs. To profile this code I have been using valgrind/callgrind. I find it to be a wonderful tool. Unfortunately my efforts at optimizations have taken me into using newer instructions sets like fma (intel) / fma4 (amd) and whenever I use these callgrind blows up because it does not support those instructions.
I understand that one solution is to get the simply not use those intrinsics, and make compiler to emit code that does not contain those instructions, but honestly I see no point in that, I want to profile the code as it is, not as valgrind can handle it.
This brings me to my question. Are there any open source or free profilers out there that can do as good a job as valgrind/callgrind? I know about gprof, but as I understand it, it essentially just stops the program at intervals and sees where it is and counts the number of times it sees each thing, which is like tearing out an eye compared to what callgrind gives me.
I would probably stick with valgrind/callgrind:
Trying out the compile flags mavx and mfma4 causes issues for me too on different processors: FMA4 is primarily an AMD feature, although support for it is filtering into Intel chips, whereas AVX is primarily an Intel feature (with support being filtered into AMD chips) however in benchmarks AVX on AMD, when supported, actually performs slower than using SSE1/2/3/4 (FMA4 fills in for SSE51, 2, 3).
Using both optimisations is perhaps not the best approach and may well lead to the behaviour you are experiencing, as they effectively stand in opposition of each other, being primarily designed for specific brands of processors. Try removing FMA4 if you are compiling for an Intel CPU that supports AVX and using FMA4 if compiling for an AMD processor that supports FMA4.
That having been said, the compiler will not allow the combination of multiply and add into an FMA because that would reduce 2 roundings to 1 rounding in FMA, hence, you would need to use a relaxed floating point model (something like -ffast-math *) or fail in IEEE floating point compliance by converting a lutiply and add to an FMA. Not sure how it works when you call the intrinsics specifically, but the compiler might not optimise them based on flags as they are very specific instructions.
The FMA flag (mfma4) on my Intel CPUs produces the same result reliably, with valgrind throwing similar hissy fits to the one you have posted, however it behaves fine on the AMD CPU machines, (I take it your processor is an Intel?):
vex amd64->IR: unhandled instruction bytes: 0xC4 0x43 0x19 0x6B 0xE5 0xE0 0xF2 0x44
vex amd64->IR: REX=0 REX.W=0 REX.R=1 REX.X=0 REX.B=1
vex amd64->IR: VEX=1 VEX.L=0 VEX.nVVVV=0xC ESC=0F3A
vex amd64->IR: PFX.66=1 PFX.F2=0 PFX.F3=0
This is from the test code below.
FMA3 Intrinsics: (AVX2 - Intel Haswell)
_mm_fmadd_pd(), _mm256_fmadd_pd()
_mm_fmadd_ps(), _mm256_fmadd_ps()
and many many more besides....
FMA4 Intrinsics: (XOP - AMD Bulldozer)
_mm_macc_pd(), _mm256_macc_pd()
_mm_macc_ps(), _mm256_macc_ps()
and many many more besides....
Notes
FMA offers support for features that were scheduled to be part of SSE5 such as:
XOP: Integer vector multiply–accumulate instructions, integer vector horizontal addition, integer vector compare, shift and rotate instructions, byte permutation and conditional move instructions, floating point fraction extraction.
FMA4: Floating-point vector multiply–accumulate.
F16C: Half-precision floating-point conversion.
Test Code
float vfmaddsd_func(float f1, float f2, float f3){
return f1*f2 + f3;
}
int main() {
float f1,f2,f3;
f1 = 1.1;
f2 = 2.2;
f3 = 3.3;
float f4 = vfmaddsd_func(f1,f2,f3);
printf("%f\n", f4);
return 0;
}
I am studying the book "Hacking The Art of Exploitation 2nd Edition" on my own and have reached the first set of obstacles.
In GDB I can understand that this code:
x/x $rip
Will examine the register $rip and output in hexadecimal.
But what does this code do:
x/2x $rip
The book says it is examining multiple units at the target address. But does that mean it is showing the value of $rip the next 2 times it changes. Or does it mean something else?
One more question as Columbo would say. After I invoke the examine command, I get:
0x100000f00 <main+8> 0x00fc45c7
What does main+8 mean?
x/x $rip Will examine the register $rip and output in hexadecimal.
That's incorrect: it will examine memory pointed to by $rip. If you wanted to examine $rip itself, you'd use print/x $rip.
But what does this code do: x/2x $rip
It examines two words of memory, pointed to by $rip.
What does main+8 mean
It means that you are looking at memory containing instructions, at offset 8 from the start of main()