GDB - What is the mysterious Assembly code? - c++

Dump of assembler code for function main:
0x0804833e <+0>: push %ebp
0x0804833f <+1>: mov %esp,%ebp
0x08048341 <+3>: sub $0x8,%esp
0x08048344 <+6>: and $0xfffffff0,%esp
0x08048347 <+9>: mov $0x0,%eax
0x0804834c <+14>: add $0xf,%eax
0x0804834f <+17>: add $0xf,%eax
0x08048352 <+20>: shr $0x4,%eax
0x08048355 <+23>: shl $0x4,%eax
0x08048358 <+26>: sub %eax,%esp
=> 0x0804835a <+28>: movl $0x10,-0x4(%ebp)
0x08048361 <+35>: movl $0x0,-0x8(%ebp)
0x08048368 <+42>: pushl -0x4(%ebp)
0x0804836b <+45>: call 0x8048334 <myfunc1 at test.c:4>
0x08048370 <+50>: add $0x4,%esp
0x08048373 <+53>: pushl -0x8(%ebp)
0x08048376 <+56>: call 0x8048339 <myfunc2 at test.c:8>
0x0804837b <+61>: add $0x4,%esp
0x0804837e <+64>: mov $0x0,%eax
0x08048383 <+69>: leave
0x08048384 <+70>: ret
End of assembler dump.
(gdb) info line
Line 16 of "test.c" starts at address 0x804835a <main+28 at test.c:16> and ends at 0x8048361 <main+35 at test.c:17>.------------------------------------(1)
(gdb) shell cat test.c
#include<stdio.h>
void myfunc1(int recv_arg1)
{
/* does nothing */
}
void myfunc2(int recv_arg1)
{
/* does nothing */
}
int main(int argc,char **argv)
{
int var1;
int var2;
var1 = 16;
var2 = 0;
myfunc1(var1);
myfunc2(var2);
return 0;
}
Note in (1) that the asm code for main is within that range !! and the asm code before this range is for something else ? What ? surely something mysterious !!

Allow me to comment this for you.
0x0804833e <+0>: push %ebp ; Establish standard
0x0804833f <+1>: mov %esp,%ebp ; stack frame record
0x08048341 <+3>: sub $0x8,%esp ; Make room for locals
0x08048344 <+6>: and $0xfffffff0,%esp ; Align esp to 16-byte memory
0x08048347 <+9>: mov $0x0,%eax ; eax=0
0x0804834c <+14>: add $0xf,%eax ; eax=f
0x0804834f <+17>: add $0xf,%eax ; eax= (eax + 0xf)
0x08048352 <+20>: shr $0x4,%eax ; ( >> 4)
0x08048355 <+23>: shl $0x4,%eax ; ( << 4)
;The above math rounds up eax as set by 0x0804834c to the next 16-byte boundary
;In this case, eax will be 0x10, rounded up from 0x0f. You compiled without
;optimizations? This could be a "probe" checking whether the upcoming call
;will fail?
0x08048358 <+26>: sub %eax,%esp ; Make room for "0x10 more mystery bytes"
0x0804835a <+28>: movl $0x10,-0x4(%ebp) ; var1 = 16
0x08048361 <+35>: movl $0x0,-0x8(%ebp) ; var2 = 0
0x08048368 <+42>: pushl -0x4(%ebp) ; push var1
0x0804836b <+45>: call 0x8048334 <myfunc1 at test.c:4> ;myfunc1( );
0x08048370 <+50>: add $0x4,%esp ; pop (var1)
0x08048373 <+53>: pushl -0x8(%ebp) ; push var2
0x08048376 <+56>: call 0x8048339 <myfunc2 at test.c:8> ;myfunc2( );
0x0804837b <+61>: add $0x4,%esp ; pop (var2)
0x0804837e <+64>: mov $0x0,%eax ; return 0;
0x08048383 <+69>: leave ; undo standard stack frame
0x08048384 <+70>: ret ; actual return
I think it is a good question, why finally execute 0x08048358 which allocates seemingly useless space. I suspect this is a check for esp out of range exception before performing the call. If you specify the processor you are using, I wonder if this will "go away" -- it smells like it might be for a specific chip's errata.

The code from 0x0804833e <+0> upto (and including) 0x08048358 <+26> is setting up what is known as a stack frame.
The first four statements are very standard. First you save the old base pointer (which is called the frame pointer in the Wikipedia article). You then set up a new base pointer by using the current value of the stack pointer.
Next, you decrement the stack pointer to make room for you local variables (notice you subtract 0x8 which is enough for you two ints). Finally, it makes sure the stack pointer is aligned to a 16 bit address.
The next group of lines (from 0x08048347 <+9> to 0x08048358 <+26>) are a bit odd. The effect is to grow the stack more, but I'm at a loss to explain why it used 5 instructions to compute the value (since there is no variable, it should be able to do that at compile time) or why it needs to grow the stack more.

This is a guess... I'm not quite sure I understand the question correctly.
<+3> to <+26> look a bit frivolous. Perhaps it's to make the variable declarations explicit in code to ease debugging? I bet nearly all that code would disappear if optimizations were enabled.
Edit:
Now that I've learned to horizontally scroll, I see this does appear to be what you're referring to. That message is saying line 16 (the first assignment) starts at main+28.
All the code before that is setting up the stack to hold the local variables.

Often functions need a prologue, and an epilogue (it depends on conventions for functions calling, a bit on the processor too, ...). The prologue sets up everything needed for local variables and arguments passed in and eventually other stuffs. The epilogue "clears" what the prologue has done.
The exact produced code depends on the compiler and its version. E.g. doing gcc -S with your C code, I obtain different output, and of course if I add -On options, the output changes too.
0x0804833e <+0>: push %ebp
save current ebp register
0x0804833f <+1>: mov %esp,%ebp
copy esp register to ebp (aka base pointer or frame pointer)
0x08048341 <+3>: sub $0x8,%esp
make rooms on the stack (for 2 32bit integers)
0x08048344 <+6>: and $0xfffffff0,%esp
align stack to multiple of 16
0x08048347 <+9>: mov $0x0,%eax
eax = 0
0x0804834c <+14>: add $0xf,%eax
eax += 15
0x0804834f <+17>: add $0xf,%eax
eax += 15 (eax == 30)
0x08048352 <+20>: shr $0x4,%eax
0x08048355 <+23>: shl $0x4,%eax
total effect: zeros less significant nibble of eax;
30 = b:11110 -> eax = b:10000
0x08048358 <+26>: sub %eax,%esp
more 16 bytes room on the esp
esp -> dword room made by last esp-eax
dword
dword
dword
... maybe stuffs because of alignment
dword first two dword created by esp-8 (var2)
dword (var1)
ebp -> dword original ebp ptr
...
=> 0x0804835a <+28>: movl $0x10,-0x4(%ebp)
put 16 in -4(ebp), so we realize that it is var1
0x08048361 <+35>: movl $0x0,-0x8(%ebp)
put 0 in -8(ebp) so we realize it is var2
0x08048368 <+42>: pushl -0x4(%ebp)
0x0804836b <+45>: call 0x8048334 <myfunc1 at test.c:4>
pass var1 to myfunc1 (args are passed on stack, by convention)
0x08048370 <+50>: add $0x4,%esp
and cleaning the stack is up to the caller
0x08048373 <+53>: pushl -0x8(%ebp)
0x08048376 <+56>: call 0x8048339 <myfunc2 at test.c:8>
0x0804837b <+61>: add $0x4,%esp
pass var2 to myfunc2 and "clears" the stack
0x0804837e <+64>: mov $0x0,%eax
return value (0)
0x08048383 <+69>: leave
is the same as doing esp = ebp; pop ebp, i.e. take the stack
back at the initial point after the first push, and then retrieve
back original ebp value
0x08048384 <+70>: ret
return to the caller (return 0 <- eax)
This code is suboptimal. It does things unneeded, and is not what I get with gcc v 4.3.2 and without optimizations. In particular, things like two immediate adds can becomes a single add (even at the most basic stage of default optimization), and shr-shl pair can become a single and. Effectively, this code looks to me stranger than "normal" compiler output looks to me.

Related

GDB disassembly address different when adding breakpoint [duplicate]

This question already has an answer here:
GDB - Address of breakpoint
(1 answer)
Closed 1 year ago.
Here is my function with line numbers
8 | void function(char* string) {
9 | char buffer[16];
10| strcpy(buffer,string);
11| }
Here is gdb disassemble function output
0x000011d4 <+0>: push %ebp
0x000011d5 <+1>: mov %esp,%ebp
0x000011d7 <+3>: push %ebx
0x000011d8 <+4>: sub $0x14,%esp
0x000011db <+7>: call 0x123d <__x86.get_pc_thunk.ax>
0x000011e0 <+12>: add $0x2e20,%eax
0x000011e5 <+17>: sub $0x8,%esp <---- I want Break point here
0x000011e8 <+20>: pushl 0x8(%ebp)
0x000011eb <+23>: lea -0x18(%ebp),%edx
0x000011ee <+26>: push %edx
0x000011ef <+27>: mov %eax,%ebx
0x000011f1 <+29>: call 0x1030 <strcpy#plt>
0x000011f6 <+34>: add $0x10,%esp
0x000011f9 <+37>: nop
0x000011fa <+38>: mov -0x4(%ebp),%ebx
0x000011fd <+41>: leave
0x000011fe <+42>: ret
If I set break point at 0x000011e5 using the following command,
(gdb) b *0x000011e5
and run the program, gdb ignores all breakpoints and exits.
But, if I specify,
b 9, it works.
Here is the output
(gdb) b 10
Breakpoint 1 at 0x4011e5: file hello.c, line 10.
Why are the address different ?
Why are the address different
Because you have a position-independent executable, which is linked at address 0, but relocated to a different address at runtime.

GDB can't create a breakpoint [duplicate]

This question already has an answer here:
Cannot insert breakpoints. Addresses with low values
(1 answer)
Closed 4 years ago.
I am working on implementing a simple stack overflow, which I am examining with gdb. A problem I keep coming up with is gdb not accepting my breakpoints. My c code is quite simple:
void function(int a, int b, int c) {
...//stuff
}
void main() {
int x;
x = 0;
function(1,2,3);
x = 1;
printf("%d\n",x);
}
And i'm using gcc -m32 -fno-stack-protector -o example3test example3test.c to complie it.
I have tried just setting a simple breakpoint on the line <+42> just to test if it works.
(gdb) disass main
Dump of assembler code for function main:
0x000005d1 <+0>: lea 0x4(%esp),%ecx
0x000005d5 <+4>: and $0xfffffff0,%esp
0x000005d8 <+7>: pushl -0x4(%ecx)
0x000005db <+10>: push %ebp
0x000005dc <+11>: mov %esp,%ebp
0x000005de <+13>: push %ebx
0x000005df <+14>: push %ecx
0x000005e0 <+15>: sub $0x10,%esp
0x000005e3 <+18>: call 0x470 <__x86.get_pc_thunk.bx>
0x000005e8 <+23>: add $0x1a18,%ebx
0x000005ee <+29>: movl $0x0,-0xc(%ebp)
0x000005f5 <+36>: push $0x3
0x000005f7 <+38>: push $0x2
0x000005f9 <+40>: push $0x1
0x000005fb <+42>: call 0x5a0 <function>
0x00000600 <+47>: add $0xc,%esp
0x00000603 <+50>: movl $0x1,-0xc(%ebp)
0x0000060a <+57>: sub $0x8,%esp
0x0000060d <+60>: pushl -0xc(%ebp)
0x00000610 <+63>: lea -0x1950(%ebx),%eax
0x00000616 <+69>: push %eax
0x00000617 <+70>: call 0x400 <printf#plt>
0x0000061c <+75>: add $0x10,%esp
0x0000061f <+78>: nop
0x00000620 <+79>: lea -0x8(%ebp),%esp
0x00000623 <+82>: pop %ecx
0x00000624 <+83>: pop %ebx
0x00000625 <+84>: pop %ebp
0x00000626 <+85>: lea -0x4(%ecx),%esp
0x00000629 <+88>: ret
End of assembler dump.
(gdb) break *0x000005fb
Breakpoint 1 at 0x5fb
(gdb) run
Starting program: /home/jasmine/tutorials/smashingTheStackForFun/example3test
Warning:
Cannot insert breakpoint 1.
Cannot access memory at address 0x5fb
I'm lost as to why it won't accept this breakpoint. Most of the answers already on here involve not using the * or using wrong notation, from what I can see mine looks right, but I could be wrong.
I'm lost as to why it won't accept this breakpoint.
You have a position independent executable, which is relocated to a different address at runtime.
This will work:
(gdb) start
# GDB stops at main
(gdb) break *&main+42
(gdb) continue
See also this answer.

GDB Dis-Flavor set to Intel, but showing AT&T-style

I've set the disassembly-flavor of the gdb-debugger to Intel (both: su & normal user), but anyway it's still showing the assembly-code in AT&T notation:
patrick#localhost:~/Dokumente/Projekte$ gdb -q ./a.out
Reading symbols from ./a.out...done.
(gdb) break main
Breakpoint 1 at 0x40050e: file firstprog.c, line 5.
(gdb) run
Starting program: /home/patrick/Dokumente/Projekte/a.out
Breakpoint 1, main () at firstprog.c:5
5 for(i=0; i < 10; i++)
(gdb) show disassembly
The disassembly flavor is "intel".
(gdb) info registers
rax 0x400506 4195590
rbx 0x0 0
rcx 0x0 0
rdx 0x7fffffffe2d8 140737488347864
rsi 0x7fffffffe2c8 140737488347848
rdi 0x1 1
rbp 0x7fffffffe1e0 0x7fffffffe1e0
(gdb) info register eip
Invalid register `eip'
I did restart the computer. My OS is Kali Linux amd64.
I have the following questions:
Why is gdb still showing the AT&T notation?
Why is the register EIP (instruction pointer) shown as invalid register?
You are misunderstanding what disassembly flavour means. It means exactly that: what the disassembly looks like when you view machine code in a human-readable(ish) form.
To print registers (or use registers in any other context), you need to use $reg, such as $rip or $pc, $eax, etc.
If I disassemble one of my programs with at&t syntax, gdb shows this:
0x00000000007378f0 <+0>: push %rbp
0x00000000007378f1 <+1>: mov %rsp,%rbp
0x00000000007378f4 <+4>: sub $0x20,%rsp
0x00000000007378f8 <+8>: movl $0x0,-0x4(%rbp)
0x00000000007378ff <+15>: mov %edi,-0x8(%rbp)
0x0000000000737902 <+18>: mov %rsi,-0x10(%rbp)
=> 0x0000000000737906 <+22>: mov -0x10(%rbp),%rsi
0x000000000073790a <+26>: mov (%rsi),%rdi
0x000000000073790d <+29>: callq 0x737950 <FindLibPath(char const*)>
0x0000000000737912 <+34>: xor %eax,%eax
Then do this:
(gdb) set disassembly-flavor intel
(gdb) disass main
Dump of assembler code for function main(int, char**):
0x00000000007378f0 <+0>: push rbp
0x00000000007378f1 <+1>: mov rbp,rsp
0x00000000007378f4 <+4>: sub rsp,0x20
0x00000000007378f8 <+8>: mov DWORD PTR [rbp-0x4],0x0
0x00000000007378ff <+15>: mov DWORD PTR [rbp-0x8],edi
0x0000000000737902 <+18>: mov QWORD PTR [rbp-0x10],rsi
=> 0x0000000000737906 <+22>: mov rsi,QWORD PTR [rbp-0x10]
0x000000000073790a <+26>: mov rdi,QWORD PTR [rsi]
0x000000000073790d <+29>: call 0x737950 <FindLibPath(char const*)>
0x0000000000737912 <+34>: xor eax,eax
and you can see the difference. But the names of registers and how you use registers on the gdb command-line isn't changing, you need a $reg in both cases.

why there is difference in address of a function while using gdb break and gdb print?

When i execute the following commands i get different address of function()
(gdb) break function()
Breakpoint 1 at function() 0x804834a.
(gdb) print function()
Breakpoint 1 at function() 0x8048344.
Why there is difference in both address?
This output can't be correct, it would be if you did something as:
int func(void) {
int a = 10;
printf("%d\n", a);
return 1;
}
after loading it into the gdb:
(gdb) p func
$1 = {int (void)} 0x4016b0 <func>
(gdb) b func
Breakpoint 1 at 0x4016b6: file file.c, line 4.
(gdb) disassemble func
Dump of assembler code for function func:
0x004016b0 <+0>: push %ebp
0x004016b1 <+1>: mov %esp,%ebp
0x004016b3 <+3>: sub $0x28,%esp
0x004016b6 <+6>: movl $0xa,-0xc(%ebp)
0x004016bd <+13>: mov -0xc(%ebp),%eax
0x004016c0 <+16>: mov %eax,0x4(%esp)
0x004016c4 <+20>: movl $0x405064,(%esp)
0x004016cb <+27>: call 0x403678 <printf>
0x004016d0 <+32>: mov $0x1,%eax
0x004016d5 <+37>: leave
0x004016d6 <+38>: ret
End of assembler dump.
(gdb)
Here func points to the exact first instruction in the function, push %ebp, but when you setup a break point, gdb sets it after stack frame initialization instructions:
0x004016b0 <+0>: push %ebp
0x004016b1 <+1>: mov %esp,%ebp
0x004016b3 <+3>: sub $0x28,%esp
at where the instructions of the function actually begins:
=> 0x004016b6 <+6>: movl $0xa,-0xc(%ebp)
0x004016bd <+13>: mov -0xc(%ebp),%eax
0x004016c0 <+16>: mov %eax,0x4(%esp)
0x004016c4 <+20>: movl $0x405064,(%esp)
0x004016cb <+27>: call 0x403678 <printf>
0x004016d0 <+32>: mov $0x1,%eax
0x004016d5 <+37>: leave
0x004016d6 <+38>: ret
here this instruction:
movl $0xa,-0xc(%ebp) ; 0xa = 10
is this part:
int a = 10;
Gdb sets a breakpoint after function prologue, as before the things are properly set up it could not show the expected state like local variables, etc.
Break therefor sets breakpoint and prints address of first instruction after prologue, whereas print prints the address of actual first instruction in function.
You can set a breakpoint to actual first instruction by doing break *0x8048344, then observe the value of local variables there and after prologue.

OSX 64 bit C++ DIsassembly line by line

I have been reading through the following series of articles: http://www.altdevblogaday.com/2011/11/09/a-low-level-curriculum-for-c-and-c
The disassembled code shown and the disassembled code I am managing to produce whilst running the same code vary quite significantly and I lack the understanding to explain the differences.
Is there anyone that can step through it line by line and perhaps explain what it's doing at each step ? I get the feeling from the searching around I have done that the first few lines have something to do with frame pointers, there also seems to be a few extra lines in my disassembled code that ensures registers are empty before placing new values into them (absent from the code in the article)
I am running this on OSX (original author is using Windows) using the g++ compiler from within XCode 4. I am really clueless as to weather or not these variances are due to the OS, the architecture (32 bit vs 64 bit maybe?) or the compiler itself. It could even be the code I guess - mine is wrapped inside the main function declaration whereas the original code makes no mention of this.
My code:
int main(int argc, const char * argv[])
{
int x = 1;
int y = 2;
int z = 0;
z = x + y;
}
My disassembled code:
0x100000f40: pushq %rbp
0x100000f41: movq %rsp, %rbp
0x100000f44: movl $0, %eax
0x100000f49: movl %edi, -4(%rbp)
0x100000f4c: movq %rsi, -16(%rbp)
0x100000f50: movl $1, -20(%rbp)
0x100000f57: movl $2, -24(%rbp)
0x100000f5e: movl $0, -28(%rbp)
0x100000f65: movl -20(%rbp), %edi
0x100000f68: addl -24(%rbp), %edi
0x100000f6b: movl %edi, -28(%rbp)
0x100000f6e: popq %rbp
0x100000f6f: ret
The disassembled code from the original article:
mov dword ptr [ebp-8],1
mov dword ptr [ebp-14h],2
mov dword ptr [ebp-20h],0
mov eax, dword ptr [ebp-8]
add eax, dword ptr [ebp-14h]
mov dword ptr [ebp-20h],eax
A full line by line breakdown would be extremely enlightening but any help in understanding this would be appreciated.
All of the code from the original article is in your code, there's just some extra stuff around it. This:
0x100000f50: movl $1, -20(%rbp)
0x100000f57: movl $2, -24(%rbp)
0x100000f5e: movl $0, -28(%rbp)
0x100000f65: movl -20(%rbp), %edi
0x100000f68: addl -24(%rbp), %edi
0x100000f6b: movl %edi, -28(%rbp)
Corresponds directly to the 6 instructions talked about in the article.
There are two major differences between your disassembled code and the article's code.
One is that the article is using the Intel assembler syntax, while your disassembled code is using the traditional Unix/AT&T assembler syntax. Some differences between the two are documented on Wikipedia.
The other difference is that the article omits the function prologue, which sets up the stack frame, and the function epilogue, which destroys the stack frame and returns to the caller. The program he's disassembling has to contain instructions to do those things, but his disassembler isn't showing them. (Actually the stack frame could and probably would be omitted if the optimizer were enabled, but it's clearly not enabled.)
There are also some minor differences: your code is using a slightly different layout for local variables, and your code is computing the sum in a different register.
On the Mac, g++ doesn't support emitting Intel mnemonics, but clang does:
:; clang -S -mllvm --x86-asm-syntax=intel t.c
:; cat t.s
.section __TEXT,__text,regular,pure_instructions
.globl _main
.align 4, 0x90
_main: ## #main
.cfi_startproc
## BB#0:
push RBP
Ltmp2:
.cfi_def_cfa_offset 16
Ltmp3:
.cfi_offset rbp, -16
mov RBP, RSP
Ltmp4:
.cfi_def_cfa_register rbp
mov EAX, 0
mov DWORD PTR [RBP - 4], EDI
mov QWORD PTR [RBP - 16], RSI
mov DWORD PTR [RBP - 20], 1
mov DWORD PTR [RBP - 24], 2
mov DWORD PTR [RBP - 28], 0
mov EDI, DWORD PTR [RBP - 20]
add EDI, DWORD PTR [RBP - 24]
mov DWORD PTR [RBP - 28], EDI
pop RBP
ret
.cfi_endproc
.subsections_via_symbols
If you add the -g flag, the compiler will add debug information including source filenames and line numbers. It's too big to put here in its entirety, but this is the relevant part:
.loc 1 4 14 prologue_end ## t.c:4:14
Ltmp5:
mov DWORD PTR [RBP - 20], 1
.loc 1 5 14 ## t.c:5:14
mov DWORD PTR [RBP - 24], 2
.loc 1 6 14 ## t.c:6:14
mov DWORD PTR [RBP - 28], 0
.loc 1 8 5 ## t.c:8:5
mov EDI, DWORD PTR [RBP - 20]
add EDI, DWORD PTR [RBP - 24]
mov DWORD PTR [RBP - 28], EDI
First of all, the assembler listed as "from original article" is using "Intel" syntax, where the "disassembled output" in your post is "AT&T syntax". This explains the order of arguments to instructions being "back to front" [let's not argue about which is right or wrong, ok?], and register names are prefixed by a %, constants prefixed by $. There is also a difference in how memory locations/offsets to registers are referenced - dword ptr [reg+offs] in Intel assembler translates to l as a suffix on the instruction, and offs(%reg).
The 32-bit vs. 64-bit renames some of the registers - %rbp is the same as ebp in the article code.
The actual offsets (e.g -20) are different partly because the registers are bigger in 64-bit, but also because you have argc and argv as part of your function arguments, which is stored as part of the start of the function - I have a feeling the original article is actually disassembling a different function than main.