GCC/G++ addresses and can't read the registers [duplicate] - c++

This question already has answers here:
GDB Cannot insert breakpoint, Cannot access memory at address XXX? [duplicate]
(2 answers)
Closed 5 years ago.
Dump of assembler code for function main():
0x000000000000071a <+0>: push rbp
0x000000000000071b <+1>: mov rbp,rsp
0x000000000000071e <+4>: sub rsp,0x20
0x0000000000000722 <+8>: mov rax,QWORD PTR fs:0x28
0x000000000000072b <+17>: mov QWORD PTR [rbp-0x8],rax
0x000000000000072f <+21>: xor eax,eax
0x0000000000000731 <+23>: lea rax,[rbp-0x20]
0x0000000000000735 <+27>: mov rdi,rax
0x0000000000000738 <+30>: call 0x764 <Test::Test()>
0x000000000000073d <+35>: lea rax,[rbp-0x20]
0x0000000000000741 <+39>: mov rdi,rax
0x0000000000000744 <+42>: call 0x7ae <Test::a()>
0x0000000000000749 <+47>: mov eax,0x0
0x000000000000074e <+52>: mov rdx,QWORD PTR [rbp-0x8]
0x0000000000000752 <+56>: xor rdx,QWORD PTR fs:0x28
0x000000000000075b <+65>: je 0x762 <main()+72>
0x000000000000075d <+67>: call 0x5f0 <__stack_chk_fail#plt>
0x0000000000000762 <+72>: leave
0x0000000000000763 <+73>: ret
End of assembler dump.
I have a problem.. I'm trying to debug the program but the addresses are weird and I can't read the registers(after start). "The program has no registers now."
and that's happens at any program that I've compiled in my computer.
EDIT:
gef➤ break*0x0000000000000763
Breakpoint 1 at 0x763: file 1.cpp, line 36.
gef➤ r
Starting program: /root/Desktop/Challenges/AdvancedMemoryChallenges/1.bin
Warning:
Cannot insert breakpoint 1.
Cannot access memory at address 0x763
gef➤ info reg $rip
rip 0x7ffff7dd9c20 0x7ffff7dd9c20
gef➤
gef➤ start
[+] Breaking at '{int (void)} 0x55555555471a <main()>'
[!] Command 'entry-break' failed to execute properly, reason: Warning:
Cannot insert breakpoint 1.
Cannot access memory at address 0x763

0x763 is an address before relocation. (It is unclear whether it is from an object file or the actual executable.)
The addresses of code in a running program are never this low in the address space.
You need to set a breakpoint on _start or main, start the program, and see which addresses the kernel assigns to the machine code in question. The GDB disassemble command print will print such addresses.
GDB automatically disables address space layout randomization (ASLR), so the addresses will be constant as long as you do not change the program, its libraries, or the kernel (which sometimes results in process layout changes, too).

Related

asm inspection of c++ compiled object. what is the meaning of this cs: part [duplicate]

I am writing simple programs then analyze them.
Today I've written this:
#include <stdio.h>
int x;
int main(void){
printf("Enter X:\n");
scanf("%d",&x);
printf("You enter %d...\n",x);
return 0;
}
It's compiled into this:
push rbp
mov rbp, rsp
lea rdi, s ; "Enter X:"
call _puts
lea rsi, x
lea rdi, aD ; "%d"
mov eax, 0
call ___isoc99_scanf
mov eax, cs:x <- don't understand this
mov esi, eax
lea rdi, format ; "You enter %d...\n"
mov eax, 0
call _printf
mov eax, 0
pop rbp
retn
I don't understand what cs:x means.
I use Ubuntu x64, GCC 10.3.0, and IDA pro 7.6.
TL:DR: IDA confusingly uses cs: to indicate a RIP-relative addressing mode in 64-bit code.
In IDA mov eax, x means mov eax, DWORD [x] which in turn means reading a DWORD from the variable x.
For completeness, mov rax, OFFSET x means mov rax, x (i.e. putting the address of x in rax).
In 64-bit displacements are still 32-bit, so, for a Position Independent Executable, it's not always possible to address a variable by encoding its address (because it's 64-bit and it would not fit into a 32-bit field). And in position-independent code, it's not desirable.
Instead, RIP-relative addressing is used.
In NASM, RIP-relative addressing takes the form mov eax, [REL x], in gas it is mov x(%rip), %eax.
Also, in NASM, if DEFAULT REL is active, the instruction can be shortened to mov eax, [x] which is identical to the 32-bit syntax.
Each disassembler will disassemble a RIP-relative operand differently. As you commented, Ghidra gives mov eax, DWORD PTR [x].
IDA uses mov eax, cs:x to mean mov eax, [REL x]/mov x(%rip), %eax.
;IDA listing, 64-bit code
mov eax, x ;This is mov eax, [x] in NASM and most likely wrong unless your exec is not PIE and always loaded <= 4GiB
mov eax, cs:x ;This is mov eax, [REL x] in NASM and idiomatic to 64-bit programs
In short, you can mostly ignore the cs: because that's just the way variables are addressed in 64-bit mode.
Of course, as the listing above shows, the use or absence of RIP-relative addressing tells you the program can be loaded anywhere or just below the 4GiB.
The cs prefix shown by IDA threw me off.
I can see that it could mentally resemble "code" and thus the rip register but I don't think the RIP-relative addressing implies a cs segment override.
In 32-bit mode, the code segment is usually read-only, so an instruction like mov [cs:x], eax will fault.
In this scenario, putting a cs: in front of the operand would be wrong.
In 64-bit mode, segment overrides (other than fs/gs) are ignored (and the read-bit of the code segment is ignored anyway), so the presence of a cs: doesn't really matter because ds and cs are effectively indistinguishable. (Even an ss or ds override doesn't change the #GP or #SS exception for a non-canonical address.)
Probably the AGU doesn't even read the segment shadow registers anymore for segment bases other than fs or gs. (Although even in 32-bit mode, there's a lower latency fast path for the normal case of segment base = 0, so hardware may just let that do its job.)
Still cs: is misleading in my opinion - a 2E prefix byte is still possible in machine code as padding. Most tools still call it a CS prefix, although http://ref.x86asm.net/coder64.html calls it a "null prefix" in 64-bit mode. There's no such byte here, and cs: is not an obvious or clear way to imply RIP-relative addressing.

Why jump into the unavailable address(GDB)

When I debugged a code, and found:
0x08048500 <+0>: push %ebp
0x08048501 <+1>: mov %esp,%ebp
...
0x08048563 <+99>: jmp 0x8048567 <Postion+103> <===0x8048567 doesn't exist an instruction.
0x08048565 <+101>: dec %edx
0x08048566 <+102>: cmp %bh,%al
0x08048568 <+104>: test %edx,%esp
Q: Why does "jmp 0x8048567" jump into <+103>? It doesn't exist an instruction. What's the point? Thanks.
Why does "jmp 0x8048567" jump into <+103>? It doesn't exist an instruction
It's very likely that the instruction at 0x8048567 does exist. You can see it with x/4i 0x8048567.
What is probably happening is that instruction at 0x8048565 doesn't really exist, but GDB doesn't know that, continues disassembling one instruction after another, and gets out of sync with the real instruction stream.

Purpose of rep stos assembly command in this code from Visual Studio [duplicate]

This question already has answers here:
Can anyone help me interpret this MSVC debug-mode disassembly from a simple Hello World?
(5 answers)
Closed 8 years ago.
Take a look at the following code:
void f()
{
}
I compiled this in Visual Studio 2013, debug, 32-bit mode and looked at the dissassembly.
void f()
{
00304CB0 push ebp
00304CB1 mov ebp,esp
00304CB3 sub esp,0C0h
00304CB9 push ebx
00304CBA push esi
00304CBB push edi
00304CBC lea edi,[ebp-0C0h]
00304CC2 mov ecx,30h
00304CC7 mov eax,0CCCCCCCCh
00304CCC rep stos dword ptr es:[edi]
}
00304CCE pop edi
00304CCF pop esi
00304CD0 pop ebx
00304CD1 mov esp,ebp
00304CD3 pop ebp
00304CD4 ret
What is the purpose of the rep stos instruction?
I'm just curious.
The rep stos instruction writes the value in eax starting at the address pointed to by edi (your local stack in this case), ecx (0x30) times. The value in eax is 0xcccccccc which is a magic number chosen by microsoft to indicate uninitialized memory. The debugger will catch you if you try and dereference a pointer from this memory. This extra diagnostic checking is enabled by the /RTCu option.
Now you might ask why, with an empty function body, any memory would be reserved on the local stack. This is because you have edit and continue turned on with the /ZI option. The compiler is just setting aside some space in case you decide to use it in a debug session.

C++ CodeBlocks disassembly; Way too much code?

I ran the debugger on CodeBlocks and viewed the disassembly window.
The full source code for the program I debugged is the following:
int main(){}
and the assembly code I saw in the window was this:
00401020 push %ebp
00401021 mov %esp,%ebp
00401023 push %ebx
00401024 sub $0x34,%esp
00401027 movl $0x401150,(%esp)
0040102E call 0x401984 <SetUnhandledExceptionFilter#4>
00401033 sub $0x4,%esp
00401036 call 0x401330 <__cpu_features_init>
0040103B call 0x401740 <fpreset>
00401040 lea -0x10(%ebp),%eax
00401043 movl $0x0,-0x10(%ebp)
0040104A mov %eax,0x10(%esp)
0040104E mov 0x402000,%eax
00401053 movl $0x404004,0x4(%esp)
0040105B movl $0x404000,(%esp)
00401062 mov %eax,0xc(%esp)
00401066 lea -0xc(%ebp),%eax
00401069 mov %eax,0x8(%esp)
0040106D call 0x40192c <__getmainargs>
00401072 mov 0x404008,%eax
00401077 test %eax,%eax
00401079 jne 0x4010c5 <__mingw_CRTStartup+165>
0040107B call 0x401934 <__p__fmode>
00401080 mov 0x402004,%edx
00401086 mov %edx,(%eax)
00401088 call 0x4014f0 <_pei386_runtime_relocator>
0040108D and $0xfffffff0,%esp
00401090 call 0x401720 <__main>
00401095 call 0x40193c <__p__environ>
0040109A mov (%eax),%eax
0040109C mov %eax,0x8(%esp)
004010A0 mov 0x404004,%eax
004010A5 mov %eax,0x4(%esp)
004010A9 mov 0x404000,%eax
004010AE mov %eax,(%esp)
004010B1 call 0x401318 <main>
004010B6 mov %eax,%ebx
004010B8 call 0x401944 <_cexit>
004010BD mov %ebx,(%esp)
004010C0 call 0x40198c <ExitProcess#4>
004010C5 mov 0x4050f4,%ebx
004010CB mov %eax,0x402004
004010D0 mov %eax,0x4(%esp)
004010D4 mov 0x10(%ebx),%eax
004010D7 mov %eax,(%esp)
004010DA call 0x40194c <_setmode>
004010DF mov 0x404008,%eax
004010E4 mov %eax,0x4(%esp)
004010E8 mov 0x30(%ebx),%eax
004010EB mov %eax,(%esp)
004010EE call 0x40194c <_setmode>
004010F3 mov 0x404008,%eax
004010F8 mov %eax,0x4(%esp)
004010FC mov 0x50(%ebx),%eax
004010FF mov %eax,(%esp)
00401102 call 0x40194c <_setmode>
00401107 jmp 0x40107b <__mingw_CRTStartup+91>
0040110C lea 0x0(%esi,%eiz,1),%esi
Is it normal to get this much assembly code from so little C++ code?
By normal, I mean is this close to the average amount of assembly code the MinGW compiler generates relative to the amount of C++ source code I provided above?
Yes, this is fairly typical startup/shutdown code.
Before your main runs, a few things need to happen:
stdin/stdout/stderr get opened
cin/cout/cerr/clog get opened, referring to stdin/stdout/stderr
Any static objects you define get initialized
command line gets parsed to produce argc/argv
environment gets retrieved (maybe)
Likewise, after your main exits, a few more things have to happen:
Anything set up with atexit gets run
Your static objects get destroyed
cin/cout/cerr/clog get destroyed
all open output streams get flushed and closed
all open input streams get closed
Depending on the platform, there may be a few more things as well, such as setting up some default exception handlers (for either C++ exceptions, some platform-specific exceptions, or both).
Note that most of this is fixed code that gets linked into essentially every program, regardless of what it does or doesn't contain. In theory, they can use some tricks (e.g., "weak externals") to avoid linking in some of this code when it isn't needed, but most of what's above is used so close to universally (and the code to handle it is sufficiently trivial) that it's pretty rare to bother going to any work to eliminate this little bit of code, even when it's not going to be used (like your case, where nothing gets used at all).
Note that what you've shown is startup/shutdown code though. It's linked into your program, traditionally from a file named something like crt0 (along with, perhaps, some additional files).
If you look through your file for the code generated for main itself, you'll probably find that it's a lot shorter--possibly as short and simple as just ret. It may be so tiny that you missed the fact that it's there at all though.
This call 0x401318 <main>
is what you code resolved to, basically. main() is a function and there is code surrounding it, often called something like __start and __end.
What you see amounts, in part, to the CRT support code in __start, and cleanup afterward in __end.

MASM Fixing 64 bit Truncation in a DLL

I am working with the Adobe Flash ocx by loading it into my C++ program. The ocx is supposed to be 64 bit but for some reason it has issues when I compile with the x64 platform. I have read up on this and found that it is likely that some function receives DWORD userData instead of void* userData through some structure and then casts it to an object pointer. This works ok in a 32-bit environment, but crashes in 64-bit.
The disassembly of the function calls inside the ocx that cause the crash are the following lines:
mov ecx,r8d
The first operation copies only low 32-bits from R8D to ECX (ECX is 32-bit).
cmp dword ptr [rcx+11BCh],0
The second operation accesses 64-bit register, where low 32-bits contains correct address and high 32-bits contains some junk. Leading to a crash, of course.
Solution
I have read that one possible solution is to do the following:
Create an asm file containing the following code:
nop
nop
nop
mov ecx,r8d
cmp dword ptr [rcx+11BCh],0
nop
nop
nop
mov rcx,r8d // I've replaced ecx with rcx here
cmp dword ptr [rcx+11BCh],0
Build an obj file using this asm file and MASM.exe
Open the obj file with a hex editor and locate the 90's that represent the nop
In the Flash ocx locate the first string of bytes between the nops and replace it with the new string of bytes that comes after the nops. This will change it from 32 bit to 64 bit function calls.
Problem
I have attempted this by making the following asm file and building it with ml64.exe (I do not have masm.exe but I think that ml.exe is the new 32 bit version of it, and this code would only build with the ml64.exe, probably because of the 64-bit only operators?):
TITLE: Print String Assembly Program (test.asm)
.Code
main Proc
nop
nop
nop
mov ecx,r8d
cmp dword ptr [rcx+11BCh],0
nop
nop
nop
mov rcx,r8
cmp dword ptr [rcx+11BCh],0
main ENDP
END
I had trouble getting it to build (I kept getting errors about instruction length matching) until I changed r8d to r8 in the second section.
I got this obj to build, and opened it with a hex editor and was able to locate the two byte strings. But where my problem comes is that when I search for the first byte string that should be in the flash ocx, I cannot find it. It is not there, so I cannot replace it with the second one.
What am I doing wrong?
Thanks!
Create an asm file containing the following code:
nop
nop
nop
mov ecx,r8d
cmp dword ptr [rcx+11BCh],0
nop
nop
nop
mov rcx,r8d // I've replaced ecx with rcx here
cmp dword ptr [rcx+11BCh],0
Build an obj file using this asm file and MASM.exe
Open the obj file with a hex editor and locate the 90's that represent the nop
In the Flash ocx locate the first string of bytes between the nops and replace it with the new string of bytes that comes after the nops. This will change it from 32 bit to 64 bit function calls.
I made the following asm file and built it with ml64.exe
TITLE: Print String Assembly Program (test.asm)
.Code
main Proc
nop
nop
nop
mov ecx,r8d
cmp dword ptr [rcx+11BCh],0
nop
nop
nop
mov rcx,r8
cmp dword ptr [rcx+11BCh],0
main ENDP
END
I got this obj to build, and opened it with a hex editor and was able to locate the two byte strings. I found the first byte string in the Flash OCX and changed it to the second one. (The only actual change was a 41 to a 49 in the strings)