How does gdb read the register values of a program / process it's debugging? How are registers associated with a process? - c++

I wrote a short program in c++ :
#include<iostream>
using namespace std;
int main(){
int x=10;
int y=20;
cout<< x+y <<endl;
return 0;
}
just out of curiosity i wanted to understand a program behind the hood so i was playing with gdb & came acrooss info registers command .when i use info registers in gdb i get output like this:
(gdb) info registers
rax 0x400756 4196182
rbx 0x0 0
rcx 0x6 6
rdx 0x7fffffffd418 140737488344088
rsi 0x7fffffffd408 140737488344072
rdi 0x1 1
rbp 0x7fffffffd320 0x7fffffffd320
rsp 0x7fffffffd320 0x7fffffffd320
r8 0x7ffff7ac1e80 140737348640384
r9 0x7ffff7dcfea0 140737351843488
r10 0x7fffffffd080 140737488343168
r11 0x7ffff773a410 140737344939024
r12 0x400660 4195936
r13 0x7fffffffd400 140737488344064
r14 0x0 0
r15 0x0 0
rip 0x40075a 0x40075a <main+4>
eflags 0x246 [ PF ZF IF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
I understand these are registers and their values but what I want to know is how/why are registers associated with a process. the values of registers should be changing continuously as different processes are scheduled by the operating system? I referred to the command info registers & this is what I found but this is still confusing.
info registers -> Prints the names and values of all registers except
floating-point and vector registers (in the selected stack frame).

Registers change all the time. In fact, even the debugger changes register values, as it has to run itself.
However, while you look at your program with a debugger, the debugger suspends your running process. As part of suspending, the CPU state is saved to RAM. The debugger understands this, and can just look at the suspended state in RAM. Say that register R1 was saved to address 0x1234 on suspending, then the debugger can just print the bytes stored at that address.

Each thread/process has its own register values. The user-space "architectural state" (register values) is saved on entering the kernel via a system call or interrupt. (This is true on all OSes).
See What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? for a look at Linux's system-call entry points, with the hand-written asm that actually saves registers on the process's kernel stack. (Each thread has its own kernel stack, in Linux).
In multi-tasking OSes in general, every process/thread has its own memory space for saving state, so context switches work by restoring the saved state from the thread being switched to. This is a bit of a simplification, because there's kernel state vs. saved user-space. state1
So any time a process isn't actually running on a CPU core, its register values are saved in memory.
The OS provides an API for reading/writing the saved register state, and memory, of other processes.
In Linux, this API is the ptrace(2) system call; it's what GDB uses to read register values and to single-step. Thus, GDB reads saved register values of the target process from memory, indirectly via the kernel. GDB's own code doesn't use any special x86 instructions, or even load / store from any special addresses; it just makes system calls because access to another process's state has to go through the kernel. (Well I think a process could map another process's memory into its own address space, if Linux even has a system call for that, but I think memory reads/writes actually go through ptrace just like register accesses.)
(I think) If the target process was currently executing (instead of suspended) when another process made a ptrace system call that read or wrote one of its register values, the kernel would have to interrupt it so its current state would be saved to memory. This doesn't normally happen with GDB: it only tries to read register values when it's suspended the target process.
ptrace is also what strace uses to trace system calls. See Playing with ptrace, Part I from Linux Journal. strace ./my_program is fantastically useful for systems programming, especially when making system calls from hand-written asm, to decode the args you're actually passing, and the return values.
Footnotes:
In Linux, the actual switch to a new thread happens inside the kernel, from kernel context to kernel context. This saves "only" the integer registers on the kernel stack, sets rsp to the right place in the other thread's kernel stack, then restores the saved registers. So there's a function call that, when it returns, is executing in kernel mode for the new thread, with per-CPU kernel variables set appropriately. User-space state for the new thread is eventually restored the same way it would have been if the system call or interrupt that originally entered the kernel from user-space had returned without calling the scheduler. i.e. from the state saved by the system call or interrupt kernel entry point. Lazy / eager FPU state saving is another complication; the kernel generally avoids touching the FPU so it can avoid saving/restoring FPU state when just entering the kernel and returning back to the same user-space process.

Related

Get the address of an intrinsic function-generated instruction

I have a function that uses the compiler intrinsic __movsq to copy some data from a global buffer into another global buffer upon every call of the function. I'm trying to nop out those instructions once a flag has been set globally and the same function is called again. Example code:
// compiler: MSVC++ VS 2022 in C++ mode; x64
void DispatchOptimizationLoop()
{
__movsq(g_table, g_localimport, 23);
// hopefully create a nop after movsq?
static unsigned char* ptr = (unsigned char*)(&__nop);
if (!InterlockedExchange8(g_Reduce, 1))
{
// point to movsq in memory
ptr -= 3;
// nop it out
...
}
// rest of function here
...
}
Basically the function places a nop after the movsq, and then tries to get the address of the placed nop then backtrack by the size of the movsq so that a pointer is pointing to the start of movsq, so then I can simply cover it with 3 0x90s. I am aware that the line (unsigned char*)(&__nop) is not actaully creating a nop because I'm not calling the intrinsic, I'm just trying to show what I want to do.
Is this possible, or is there a better way to store the address of the instructions that need to be nop'ed out in the future?
It's not useful to have the address of a 0x90 NOP somewhere else, all you need is the address of machine code inside your function. Nothing you've written comes remotely close to helping you find that. As you say, &__nop doesn't lead to there being a NOP in your function's machine code which you could offset relative to.
If you want to hard-code offsets that could break with different optimization settings, you could take the address of the start of the function and offset it.
Or you could write the whole function in asm so you can put a label on the address you want to modify. That would actually let you do this safely.
You might get something that happens to work with GNU C labels as values, where you can take the address of C goto labels like &&label. Like put a mylabel: before the intrinsic, and maybe after for good measure so you can check that the difference is the expected 3 bytes. If you're lucky, the compiler won't put any other instructions between your labels.
So you can memset((void*)&&mylabel, 0x90, 3) (after an assert on &&mylabel_end - &&mylabel == 3). But I don't think MSVC supports that GNU extension or anything equivalent.
But you can't actually use memset if another thread could be running this at the same time.
And for efficiency, you want a single 3-byte NOP anyway.
And of course you'd have to VirtualProtect the page of machine code containing that instruction to make it writeable. (Assuming the function is 16-byte aligned, it's hopefully impossible for that one instruction near the start to be split across two pages.)
And if other threads could be running this function at the same time, you'd better use an atomic RMW (on the containing dword or qword) to replace the 3-byte instruction with a single 3-byte NOP, otherwise you could have another thread fetch and decode the first NOP, but then fetch a byte of of the movsq machine code not replaced yet.
Actually a plain mov store would be atomic if it's 4 bytes not crossing an 8-byte boundary. Since there are no other writers of different data, it's fine to load / AND/OR / store to later store the same surrounding bytes you loaded earlier. Normally a non-atomic load+store is not thread-safe, but no other threads could have written a different value in the meantime.
Atomicity of cross-modifying code
I think cross-modifying code has atomicity rules similar to data. But if the instruction spans a 16-byte boundary, code-fetch in another core might have pulled in the first 1 or 2 bytes of it before you atomically replace all 3. So the 2nd and 3rd byte get treated as either the start of an instruction, or the 2nd + 3rd bytes of a long-NOP. Since long-NOPs generally start with 0F 1F with an escape byte, if that's not how __movsq starts then it could desync.
So if cross-modifying code doesn't trigger a pipeline nuke on the other core, it's not safe to do it while another thread might be running the code. Code fetch is usually done in 16-byte chunks but that's not guaranteed. And it's not guaranteed that they're aligned 16-byte chunks.
So you should probably make sure no other threads are running this function while you change the machine code. Unless you're very sure of the safety of what you're doing and check each build to make sure the instruction starts at a safe offset, where safe is defined according to any possibility or anything that could go wrong.

How does GDB restore instruction after breakpoint

I have read that GDB puts a int 3 (opcode CC) at the wanted adress in the target program memory.
Si this operation is erasing a piece of instruction (1 byte) in the program memory.
My question is: How and When GDB replaces the original opcode when the program continues ?
When i type disassemble in GDB, i do not see CC opcode. Is this because GDB knows it is him that puts the CC ?
Is there a way to do a raw disassemble, in order to see exactly what opcodes are loaded in memory at this instant ?
How and When GDB replaces the original opcode when the program continues ?
I use this as an interview question ;-)
On Linux, to continue past the breakpoint, 0xCC is replaced with the original instruction, and ptrace(PTRACE_SINGLESTEP, ...) is done. When the thread stops on the next instruction, original code is again replaced by the 0xCC opcode (to restore the breakpoint), and the thread continued on its merry way.
On x86 platforms that do not have PTRACE_SINGLESTEP, trap flag is set in EFLAGS via ptrace(PTRACE_SETREGS, ...) and the thread is continued. The trap flag causes the thread to immediately stop again (on the next instruction, just like PTRACE_SINGLESTEP would).
When i type disassemble in GDB, i do not see CC opcode. Is this because GDB knows it is him that puts the CC ?
Correct. A program could examine and print its own instruction stream, and you can observe breakpoint 0xCC opcodes that way.
Is there a way to do a raw disassemble, in order to see exactly what opcodes are loaded in memory at this instant ?
I don't believe there is. You can use (gdb) set debug infrun to observe what GDB is doing to the inferior (being debugged) process.
What i do not understand in fact is the exact role of SIGTRAP. Who is sending/receiving this signal ? The debugger or the target program?
Neither: after PTRACE_ATTACH, the kernel stops the inferior, then notifies the debugger that it has done so, by making debugger's wait return.
I see a wait(NULL) after the ptrace attach. What is the meaning of this wait?
See the explanation above.
Thread specific breakpoints?
For a thread-specific breakpoint, the debugger inserts a process-wide breakpoint (via 0xCC opcode), then simply immediately resumes any thread which hits the breakpoint, unless the thread is the specific one that you wanted to stop.

Time when the Heap snapshot is taken when dumping Core

We have a posix mutli-threaded C++ program running on Linux 2.6.32, which core-dumps in one of the threads. Analysing the core file with gdb-7.2 corss-compiled, we see that the faulting instruction is here
0x11491178 <+208>: lwz r0,8(r9)
and registers in the frame show:
(gdb) info reg
r0 0x0 0
….
r9 0xdeaddead 3735936685
Which makes sense as r9 has an invalid address value(in fact heap scrub pattern we write) in the context of the process/thread.
The confusing bit is that r9 is loaded from like this
0x1149116c <+196>: lwz r9,0(r4)
and r4 contains the value of (first and only) function parameter "data". GDB tells me the following information about data:
(gdb) p data
$6 = (TextProcessorIF *) 0x4b3fe858
(gdb) p *data
$7 = {_vptr.TextProcessorIF = 0x128b5390}
(gdb) info symbol 0x128b5390
vtable for TextProcessorT<unsigned short> + 8 in section .rodata
Which is all correct in this context. So r9 should have had a value of 0x128b5390 instead of the pattern "0xdeaddead" which is written when the memory is free'd and given back to the heap.
So, why the register r9 contains the scrubbed value when the memory contains a legal object. My theory is that the core contains snapshot of the memory just as the process died which is much further down the line when the actual crash happened. After the SIGSEGV has been raised, this location of the heap memory can still be used by other threads as they are logging data till the time process dies. So, it is possible that the memory pointed to by data maybe have been allocated again and being used/been used at the time memory snapshot has been taken and preserved in core.
My question is:
A) Is my theory correct?
B) Am I right in presuming that the heap memory snapshot is not taken at the time crash (signal being raised) but at in the final moments of the process?
C) Address/location that caused a SIGSEGV can still be used (by other threads)?
Thanks!
Do you use signal handler(s) for SIGSEGV? Are they asynchronous and re-entrant?
See How to write a signal handler to catch SIGSEGV?

How to handle this exception? (zero esp)

How to handle this exception?
__asm
{
mov esp, 0
mov eax, 0
div eax
}
This is not handled with try/except or SetUnhandledExceptionFilter().
Assuming this is running in an operating system, the operating system will catch the divide by zero, and then ATTEMPT to form an exception/signal stackframe for the application code. However, since the user-mode stack is "bad", it can't.
There is really no way for the operating system to deal with this, other than kill the application. [Theoretically, the could make up a new stack from some dynamically allocated memory, but it's pretty pointless, as there is no (always working) way for the application itself to recover to a sane state].
Don't set the stack pointer to something that isn't the stack - or if you do store "random" data in the stack pointer register, do not have exceptions. It's the same as "don't aim a gun at your foot and pull the trigger, unless you want to be without your foot".
Edit:
If the code is running in "kernel mode" rather than "usermode", it's even more "game over", since it will "double-fault" - the processor hits a divide by zero exception handler, which tries to write to the stack, and when it does so, it faults. This is now a "fault within a fault handler", aka a "double-fault". The typical setup of the double-fault handler is to have a separate stack, which then recovers the fault handler. But it's still game over - we don't know how to return to the original fault handler [or how to find out what the original fault handler was].
If there is no "new stack" with the double fault handler, it will triple fault a x86 processor - typically, a triple fault will make the processor restart [technically, it halts the processor with a special combination of bits signalled on the address bus to indicate that it's a "triple fault". The typical PC northbridge then resets the processor in recognition that the triple fault is an unrecoverable situation - this is why sometimes your PC simply reboots when you have poor quality drivers].
It's not a good idea to try to interact with a higher-level language's exception mechanism from embedded assembly. The compiler can do "magic" that you cannot match, and there's no (portable) way to tell the compiler that "this assembly code might throw an exception".

How to use a logical address with an FS or GS base in gdb?

gdb provides functionality to read or write to a specific linear address, for example:
(gdb) x/1wx 0x080483e4
0x80483e4 <main>: 0x83e58955
(gdb)
but how do you specify a logical address ? I came accross the following instruction:
0x0804841a <+6>: mov %gs:0x14,%eax
how can i read the memory at "%gs:0x14" in gdb, or translate this logical address to a linear address that i could use in x command ?
note: i know that i could simply read %eax after this instruction, but that is not my concern
how can i read the memory at "%gs:0x14" in gdb
You can't: there is no way for GDB to know how the segment to which %gs refers to has been set up.
or translate this logical address to a linear address that i could use in x command
Again, you can't do this in general. However, you appear to be on 32-bit x86 Linux, and there you can do that -- the %gs is set up to point to the thread descriptor via set_thread_area system call.
You can do catch syscall set_thread_area in GDB, and examine the parameters (each thread will have one such call). The code to actually do that is here. Once you know how %gs has been set up, just add 0x14 to the base_addr, and you are done.
As answered in Using GDB to read MSRs, this is possible with gdb 8, using the registers $fs_base and $gs_base.
I think the easiest way to do this is to read the content of EAX register as you can see the value of %gs:0x14 is moved to EAX.
In GDB, set a breakpoint at the address right after 0x0804841a with break. For example
break *0x0804841e
Then run the program and you can print the contents of EAX register with
info registers eax