gdb: how to disassemble non-struction piece of code? - gdb

Having this in nasm:
section .data
cod: db '0123456789ABCDEF'
section .text
global _start
_start:
nop
mov rax, 0x1122334455667788
mov rdi, 1
mov rdx, 1
mov rcx, 64
.loop:
push rax
sub rcx, 4
sar rax, cl
and rax, 0xf
lea rsi, [cod + rax]
mov rax, 1
push rcx
syscall
pop rcx
pop rax
test rcx, rcx
jnz .loop
mov rax, 60
xor rdi, rdi
syscall
The in gdb:
disas _start.loop
gives:
Attempt to extract a component of a value that is not a structure.
How can I disas the loop in gdb?
PS: I would also like to know, what is meant in gdb as structs. I suppose, it has nothing to do with c structs, but rather function frames? So gdb can see where the function start and ens? So in my case, it is a loop, not a function so it does not have any frames. Is that mean by the error?
EDIT:
I have tried steping in gdb:
(gdb) break *_start+1
Breakpoint 1, 0x0000000000401001 in _start ()
(gdb) n
Single stepping until exit from function _start,
which has no line number information.
And then output
1122334455667788[Inferior 1 (process 6257) exited normally]
BUT, I have not seen any instruction from <_start.loop> loop, It just exit from _start.
I do not know whether it is because of .loop nasm directive or it does not have "struct behavior", but how can I see the piece of code .loop in gdb before exiting from _start?

Related

Mystery: casting a GNU C label pointer to a function pointer, with inline asm to put a ret in that block. Block being optimized away?

Firstly: This code is considered to be of pure fun, please do not do anything like this in production. We will not be responsible of any harm caused to you, your company or your reindeer after compiling and executing this piece of code in any environment. The code below is not safe, not portable and is plainly dangerous. Be warned. Long post below. You were warned.
Now, after the disclaimer: Let's consider the following piece of code:
#include <stdio.h>
int fun()
{
return 5;
}
typedef int(*F)(void) ;
int main(int argc, char const *argv[])
{
void *ptr = &&hi;
F f = (F)ptr;
int c = f();
printf("TT: %d\n", c);
if(c == 5) goto bye;
//else goto bye; /* <---- This is the most important line. Pay attention to it */
hi:
c = 5;
asm volatile ("movl $5, %eax");
asm volatile ("retq");
bye:
return 66;
}
For the beginning we have the function fun which I have created purely for reference to get the generated assembly code.
Then we declare a function pointer F to functions taking no parameters and returning an int.
Then we use the not so well known GCC extension https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html to get the address of a label hi, and this works in clang too. Then we do something evil, we create a function pointer F called f and initialize it to be the label above.
Then the worst of all, we actually call this function, and assign its return value to a local variable, called C and the we print it out.
The following is an if to check if the value assigned to the c is actually the one we need, and if yes go to bye so that he application exits normally, with exit code 66. If that can be considered a normal exit code.
The next line is commented out, but I can say this is the most important line in the entire application.
The piece of code after the label hi is to assign 5 to the value of c, then two lines of assembly to initialize the value of eax to 5 and to actually return from the "function" call. As mentioned, there is a reference function, fun which generates the same code.
And now we compile this application, and run it on our online platform: https://gcc.godbolt.org/z/K6z5Yc
It generates the following assembly (with -O1 turned on, and O0 gives a similar result, albeit a bit more longer):
# else goto bye is COMMENTED OUT
fun:
mov eax, 5
ret
.LC0:
.string "TT: %d\n"
main:
push rbx
mov eax, OFFSET FLAT:.L3
call rax
mov ebx, eax
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
cmp ebx, 5
je .L4
.L3:
movl $5, %eax
retq
.L4:
mov eax, 66
pop rbx
ret
The important lines are mov eax, OFFSET FLAT:.L3 where the L3 corresponds to our hi label, and the line after that: call rax which actually calls it.
And runs like:
ASM generation compiler returned: 0
Execution build compiler returned: 0
Program returned: 66
TT: 5
Now, let's revisit the most important line in the application and uncomment it.
With -O0 we get the following assembly, generated by gcc:
# else goto bye is UNCOMMENTED
# even gcc -O0 "knows" hi: is unreachable.
fun:
push rbp
mov rbp, rsp
mov eax, 5
pop rbp
ret
.LC0:
.string "TT: %d\n"
main:
push rbp
mov rbp, rsp
sub rsp, 48
mov DWORD PTR [rbp-36], edi
mov QWORD PTR [rbp-48], rsi
mov QWORD PTR [rbp-8], OFFSET FLAT:.L4
mov rax, QWORD PTR [rbp-8]
mov QWORD PTR [rbp-16], rax
mov rax, QWORD PTR [rbp-16]
call rax
mov DWORD PTR [rbp-20], eax
mov eax, DWORD PTR [rbp-20]
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
cmp DWORD PTR [rbp-20], 5
nop
.L4:
mov eax, 66
leave
ret
and the following output:
ASM generation compiler returned: 0
Execution build compiler returned: 0
Program returned: 66
so, as you can see our printf was never called, the culprit is the line mov QWORD PTR [rbp-8], OFFSET FLAT:.L4 where L4 actually corresponds to our bye label.
And from what I can see from the generated assembly, not a piece of code from the part after hi was added into the generated code.
But at least the application runs and at least has some code for comparing c to 5.
On the other end, clang, with O0 generates the following nightmare, which by the way crashes:
# else goto bye is UNCOMMENTED
# clang -O0 also doesn't emit any instructions for the hi: block
fun: # #fun
push rbp
mov rbp, rsp
mov eax, 5
pop rbp
ret
main: # #main
push rbp
mov rbp, rsp
sub rsp, 48
mov dword ptr [rbp - 4], 0
mov dword ptr [rbp - 8], edi
mov qword ptr [rbp - 16], rsi
mov qword ptr [rbp - 24], 1
mov rax, qword ptr [rbp - 24]
mov qword ptr [rbp - 32], rax
call qword ptr [rbp - 32]
mov dword ptr [rbp - 36], eax
mov esi, dword ptr [rbp - 36]
movabs rdi, offset .L.str
mov al, 0
call printf
cmp dword ptr [rbp - 36], 5
jne .LBB1_2
jmp .LBB1_3
.LBB1_2:
jmp .LBB1_3
.LBB1_3:
mov eax, 66
add rsp, 48
pop rbp
ret
.L.str:
.asciz "TT: %d\n"
If we turn on some optimization, for example O1, we get from gcc:
# else goto bye is UNCOMMENTED
# gcc -O1
fun:
mov eax, 5
ret
.LC0:
.string "TT: %d\n"
main:
sub rsp, 8
mov eax, OFFSET FLAT:.L3
call rax
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
.L3:
mov eax, 66
add rsp, 8
ret
and the application crashes, which is sort of understandable. Again, the compiler had entirely removed our hi section (mov eax, OFFSET FLAT:.L3 goes tiptoe to L3 which corresponds to our bye section) and unfortunately decided that it's a good idea to increase rsp before a ret so to be sure we end up somewhere totally different where we need to be.
And clang delivers something even more dubious:
# else goto bye is UNCOMMENTED
# clang -O1
fun: # #fun
mov eax, 5
ret
main: # #main
push rax
mov eax, 1
call rax
mov edi, offset .L.str
mov esi, eax
xor eax, eax
call printf
mov eax, 66
pop rcx
ret
.L.str:
.asciz "TT: %d\n"
1 ? How on earth did clang end up with this?
To some level I understand that the compiler decided that dead code after an if where both if and else go to the same location is not needed, but here my knowledge and insight stops.
So now, dear C and C++ gurus, assembly aficionados and compiler crushers, here comes the question:
Why?
Why do you think did the compiler decide that the two labels should be considered equivalent if we have added the else branch, or why did clang put there 1, and last but not least: someone with a deep understanding of the C standard could maybe point out where this piece of code deviated so badly from normality that we ended up in this really really weird situation.
someone with a deep understanding of the C standard could maybe point out where this piece of code deviated so badly from normality that we ended up in this really really weird situation.
You think the ISO C standard has anything to say about this code? It's chock full of UB and GNU extensions, notably pointers to local labels.
Casting a label pointer to a function pointer and calling through it is obviously UB. The GCC manual doesn't say you can do that. It's also UB to goto a label in another function.
You were only able to make that work by tricking the compiler into thinking that block might be reached so it's not removed, then using GNU C Basic asm statements to emit a ret instruction there.
GCC and clang remove dead code even with optimization disabled; e.g. if(0) { ... } doesn't emit any instructions to implement the ...
Also note that the c=5 in hi: compiles with optimization fully disabled (and else goto bye commented) to asm like movl $5, -20(%rbp). i.e. using the caller's RBP to modify local variables in the stack frame of the caller. So you have a nested function.
GNU C allows you to define nested functions that can access the local vars of their parent scope. (If you liked the asm you got from your experiment, you'll love the executable trampoline of machine-code that GCC stores to the stack with mov-immediate if you take a pointer to a nested function!)
asm volatile ("movl $5, %eax"); is missing a clobber on EAX. You step on the compiler's toes which would be UB if this statement was ever reached normally, rather than as if it were a separate function.
The use-case for GNU C Basic asm (no constraints / clobbers) is instructions like cli (disable interrupts), not anything involving integer registers, and definitely not ret.
If you want to define a callable function using inline asm, you can use asm("") at global scope, or as the body of an __attribute__((naked)) function.

std::ifstream crashes in release build on Windows with exit code 0xC0000409: Unknown software exception

I'm reading a file using std::ifstream:
printf("Before stream initialization\n");
ifstream stream(file_path, ios::binary);
printf("Stream initialized\n");
ifstream::pos_type position = stream.tellg();
auto file_size = position;
printf("Position acquired\n");
However, the program crashes in the release mode of the binary. Here is the compiled assembly code snippet:
.text:0000000000413411 lea rcx, aBeforeStreamIn ; "Before stream initialization\n"
.text:0000000000413418 mov rbx, rax
.text:000000000041341B call _ZL6printfPKcz ; printf(char const*,...)
.text:000000000041341B ; } // starts at 41340C
.text:0000000000413420 lea rdi, [rsp+878h+var_248]
.text:0000000000413428 lea rcx, [rdi+0D8h] ; this
.text:000000000041342F mov [rsp+878h+var_820], rdi
.text:0000000000413434 call _ZNSt8ios_baseC1Ev ; std::ios_base::ios_base(void)
.text:0000000000413439 xor r8d, r8d
.text:000000000041343C mov rax, cs:_refptr__ZTVSt9basic_iosIcSt11char_traitsIcEE
.text:0000000000413443 xor edx, edx
.text:0000000000413445 mov [rsp+878h+var_90], r8w
.text:000000000041344E pxor xmm0, xmm0
.text:0000000000413452 movaps [rsp+878h+var_88], xmm0
.text:000000000041345A movaps [rsp+878h+var_78], xmm0
.text:0000000000413462 mov [rsp+878h+var_98], 0
.text:000000000041346E add rax, 10h
.text:0000000000413472 mov [rsp+878h+var_170], rax
.text:000000000041347A mov rax, cs:_refptr__ZTTSt14basic_ifstreamIcSt11char_traitsIcEE
.text:0000000000413481 mov rsi, [rax+8]
.text:0000000000413485 mov rcx, [rax+10h]
.text:0000000000413489 mov rax, [rsi-18h]
.text:000000000041348D mov [rsp+878h+var_248], rsi
.text:0000000000413495 mov [rsp+878h+var_7E8], rcx
.text:000000000041349D mov [rsp+878h+var_7F0], rsi
.text:00000000004134A5 mov [rsp+rax+878h+var_248], rcx
.text:00000000004134AD mov [rsp+878h+var_240], 0
.text:00000000004134B9 mov rcx, [rsi-18h]
.text:00000000004134BD add rcx, rdi
.text:00000000004134C0 ; try {
.text:00000000004134C0 call _ZNSt9basic_iosIcSt11char_traitsIcEE4initEPSt15basic_streambufIcS1_E ; std::basic_ios<char,std::char_traits<char>>::init(std::basic_streambuf<char,std::char_traits<char>> *)
.text:00000000004134C0 ; } // starts at 4134C0
.text:00000000004134C5 mov rax, cs:_refptr__ZTVSt14basic_ifstreamIcSt11char_traitsIcEE
.text:00000000004134CC lea rcx, [rdi+10h]
.text:00000000004134D0 add rax, 18h
.text:00000000004134D4 mov [rsp+878h+var_248], rax
.text:00000000004134DC mov rax, cs:_refptr__ZTVSt14basic_ifstreamIcSt11char_traitsIcEE
.text:00000000004134E3 add rax, 40h
.text:00000000004134E7 mov [rsp+878h+var_170], rax
.text:00000000004134EF ; try {
.text:00000000004134EF call _ZNSt13basic_filebufIcSt11char_traitsIcEEC1Ev ; std::basic_filebuf<char,std::char_traits<char>>::basic_filebuf(void)
.text:00000000004134EF ; } // starts at 4134EF
.text:00000000004134F4 lea rdx, [rdi+10h]
.text:00000000004134F8 lea rcx, [rdi+0D8h]
.text:00000000004134FF ; try {
.text:00000000004134FF call _ZNSt9basic_iosIcSt11char_traitsIcEE4initEPSt15basic_streambufIcS1_E ; std::basic_ios<char,std::char_traits<char>>::init(std::basic_streambuf<char,std::char_traits<char>> *)
.text:0000000000413504 lea rcx, [rdi+10h]
.text:0000000000413508 mov r8d, 0Eh
.text:000000000041350E mov rdx, rbx
.text:0000000000413511 call _ZNSt13basic_filebufIcSt11char_traitsIcEE4openEPKcSt13_Ios_Openmode ; std::basic_filebuf<char,std::char_traits<char>>::open(char const*,std::_Ios_Openmode)
.text:0000000000413516 mov rdx, [rsp+878h+var_248]
.text:000000000041351E add rdi, [rdx-18h]
.text:0000000000413522 test rax, rax
.text:0000000000413525 mov rcx, rdi
.text:0000000000413528 jz loc_414688
.text:000000000041352E xor edx, edx
.text:0000000000413530 call _ZNSt9basic_iosIcSt11char_traitsIcEE5clearESt12_Ios_Iostate ; std::basic_ios<char,std::char_traits<char>>::clear(std::_Ios_Iostate)
.text:0000000000413530 ; } // starts at 4134FF
.text:0000000000413535
.text:0000000000413535 loc_413535: ; CODE XREF: PointerSearcher::parse_pointer_map(void)+1363↓j
.text:0000000000413535 lea rcx, aStreamInitiali ; "Stream initialized\n"
.text:000000000041353C ; try {
.text:000000000041353C call _ZL6printfPKcz ; printf(char const*,...)
In my function it crashes at this line:
.text:0000000000413504 lea rcx, [rdi+10h]
The output is:
Before stream initialization
Process finished with exit code -1073741819 (0xC0000409)
The stacktrace is:
std::locale::operator=(std::locale const&)
std::ios_base::_M_init()
std::basic_ios<char, std::char_traits<char> >::init(std::basic_streambuf<char, std::char_traits<char> >*)
MyExecutable::myFunction()
The crash only happens in the Windows binary. The binary works in release mode for Linux. I'm using the MinGW compiler to compile the Windows binary and the compilation flags are:
-fopenmp -O3 -DNDEBUG
They're the default CMake release build flags. I also made sure the passed file_path is correct.
gdb says:
Thread 1 received signal SIGSEGV, Segmentation fault.
0x00000000004a2521 in std::locale::operator=(std::locale const&) ()
Thread 1 received signal SIGSEGV, Segmentation fault.
0x00000000004a2521 in std::locale::operator=(std::locale const&) ()
[Thread 48616.0xc508 exited with code 3221225477]
[Thread 48616.0xc510 exited with code 3221225477]
[Thread 48616.0xc638 exited with code 3221225477]
[Inferior 1 (process 48616) exited with code 030000000005]
The compiler version:
"C:\Program Files\mingw-w64\x86_64-8.1.0-win32-seh-rt_v6-rev0\mingw64\bin\x86_64-w64-mingw32-gcc.exe" --version
x86_64-w64-mingw32-gcc.exe (x86_64-win32-seh-rev0, Built by MinGW-W64 project) 8.1.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Does anyone have an idea what went wrong and how to fix it?
This seems to be a MinGW compiler bug since when using MSVC in Visual Studio to compile the code, the same exception does not occur either.

GDB Dis-Flavor set to Intel, but showing AT&T-style

I've set the disassembly-flavor of the gdb-debugger to Intel (both: su & normal user), but anyway it's still showing the assembly-code in AT&T notation:
patrick#localhost:~/Dokumente/Projekte$ gdb -q ./a.out
Reading symbols from ./a.out...done.
(gdb) break main
Breakpoint 1 at 0x40050e: file firstprog.c, line 5.
(gdb) run
Starting program: /home/patrick/Dokumente/Projekte/a.out
Breakpoint 1, main () at firstprog.c:5
5 for(i=0; i < 10; i++)
(gdb) show disassembly
The disassembly flavor is "intel".
(gdb) info registers
rax 0x400506 4195590
rbx 0x0 0
rcx 0x0 0
rdx 0x7fffffffe2d8 140737488347864
rsi 0x7fffffffe2c8 140737488347848
rdi 0x1 1
rbp 0x7fffffffe1e0 0x7fffffffe1e0
(gdb) info register eip
Invalid register `eip'
I did restart the computer. My OS is Kali Linux amd64.
I have the following questions:
Why is gdb still showing the AT&T notation?
Why is the register EIP (instruction pointer) shown as invalid register?
You are misunderstanding what disassembly flavour means. It means exactly that: what the disassembly looks like when you view machine code in a human-readable(ish) form.
To print registers (or use registers in any other context), you need to use $reg, such as $rip or $pc, $eax, etc.
If I disassemble one of my programs with at&t syntax, gdb shows this:
0x00000000007378f0 <+0>: push %rbp
0x00000000007378f1 <+1>: mov %rsp,%rbp
0x00000000007378f4 <+4>: sub $0x20,%rsp
0x00000000007378f8 <+8>: movl $0x0,-0x4(%rbp)
0x00000000007378ff <+15>: mov %edi,-0x8(%rbp)
0x0000000000737902 <+18>: mov %rsi,-0x10(%rbp)
=> 0x0000000000737906 <+22>: mov -0x10(%rbp),%rsi
0x000000000073790a <+26>: mov (%rsi),%rdi
0x000000000073790d <+29>: callq 0x737950 <FindLibPath(char const*)>
0x0000000000737912 <+34>: xor %eax,%eax
Then do this:
(gdb) set disassembly-flavor intel
(gdb) disass main
Dump of assembler code for function main(int, char**):
0x00000000007378f0 <+0>: push rbp
0x00000000007378f1 <+1>: mov rbp,rsp
0x00000000007378f4 <+4>: sub rsp,0x20
0x00000000007378f8 <+8>: mov DWORD PTR [rbp-0x4],0x0
0x00000000007378ff <+15>: mov DWORD PTR [rbp-0x8],edi
0x0000000000737902 <+18>: mov QWORD PTR [rbp-0x10],rsi
=> 0x0000000000737906 <+22>: mov rsi,QWORD PTR [rbp-0x10]
0x000000000073790a <+26>: mov rdi,QWORD PTR [rsi]
0x000000000073790d <+29>: call 0x737950 <FindLibPath(char const*)>
0x0000000000737912 <+34>: xor eax,eax
and you can see the difference. But the names of registers and how you use registers on the gdb command-line isn't changing, you need a $reg in both cases.

GDB instruction level single step over doesn't work with stripped elf?

I'm debugging a stripped elf with gdb.When I use "ni" to step over a function,GDB still step into the function.Then,how to instruction level single step over?
(gdb) x/5i $pc
0x2495: call 0xd900
0x249a: mov DWORD PTR [esp],eax
0x249d: call 0x300b3 <dyld_stub_chdir>
0x24a2: mov eax,DWORD PTR [ebp+0xc]
0x24a5: mov DWORD PTR [esp+0x4],eax
(gdb) ni
0x0000d900 in ?? ()
(gdb)

NASM Segmentation Fault ( strchrnul )

Need help in a nasm code. Have to find if intgr1 mod intgr2==0, but cant use DIV.
I am getting a segmentation fault. From gdb I found:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7aacd2a in strchrnul () from /lib/x86_64-linux-gnu/libc.so.6
My program:
;nasm -f elf64 main.nasm
;gcc -o main main.o -lc
section .text
global main
extern scanf
extern printf
section .data
request1: db "Dividendo: ", 0
request2: db "Divisor: ", 0
message1: db "Eh divisivel", 0
message2: db "Nao eh divisivel", 0
formatin: db "%d", 0
intgr1: times 4 db 0 ; 32-bits integer = 4 bytes
intgr2: times 4 db 0 ;
main:
push request1 ;imprime pedido dividendo
call printf
add esp, 4
push intgr1 ;scanf do dividendo
push formatin
call scanf
add esp, 8
push request2 ;imprime pedido divisor
call printf
add esp, 4
push intgr2 ;scanf do divisor
push formatin
call scanf
add esp, 8
mov eax, [intgr1]
mov ebx, [intgr2]
jmp L1
L1: cmp eax, ebx ;compara dividendo divisor
jb L2 ;se < entao vai pra l2
sub eax,ebx ;dividendo:=dividendo-divisor
jmp L1 ;vai pra L1
L2: cmp eax, 0 ;compara dividendo e 0
je L3 ;se igual vai para l3
jmp L4 ;se nao vai para l4
L3: push message1 ;imprime que eh divisivel
call printf
add esp, 4
L4:push message2 ;imprime que nao eh
call printf
add esp, 4
MOV AL, 1 ;termina o programa
MOV EBX, 0
INT 80h
Anyone have an idea of what is wrong?
Thanks.
nasm -f elf64 main.nasm
Your Assembling a 64bit app? We don't push parameters in 64bit land, but pass in registers.
Calling conventions Look at the line in the table for x86-64 it will tell you what registers Linux uses in its calling convention. RDI, RSI, RDX, RCX, R8, R9, XMM0–7
Your printf should be:
mov rdi, request1
xor rax, rax
call printf
Your printf call needs a format parameter, or you can have problems in the future, learn the correct way now, and have less problems later.
Likewise, scanf is the same:
mov rsi, intgr2
mov rdi, formatin
xor rax, rax
call scanf
Since your linking with the C Library, you need to call exit so the library can do it's cleanup.
xor rdi, rdi
call exit