I see it is possible in GDB to set a breakpoint which will fire when a specific memory address will be read or written.
I am wondering how it works. Does GDB have a sort of copy of the process memory and check what has changed between each instruction ? Or is it a syscall or kernel feature for that ?
(Intel x86 32 and 64 bits architecture)
I am wondering how it works.
There are two ways: software watchpoints and hardware watchpoints (only available on some architectures).
Software watchpoints work by single-stepping the application, and checking whether the value has changed after every instruction. These are painfully slow (1000x slower), and in practice aren't usable for anything other than a toy program. They also can't detect access, only change of the value in watched location.
Hardware watchpoints require processor support. Intel x86 chips have debug registers, which could be programmed to watch for access (awatch, rwatch) or change (watch) of a given memory location. When the processor detects that the location of interest has been accessed, it raises debug exception, which the OS translates into a signal, and (as usual) a signal is given to the debugger before the target sees it.
HW watchpoints execute at native speed, but (on x86) you can have only up to 4 distinct addresses (in practice, I've never needed more than 2).
Does execution of current instruction fire a watch read at eip address?
It should. You could trivially answer this yourself. Just try it.
Does push on stack fire a write on stack memory address?
Likewise.
Related
I am interested in how a hardware watchpoint is realized. If I watch for a uint8_t variable by watch varName (watch -> write watch) so in case any data is changed on any bit within the range of the memory location it will be detected?
And how is the memory location/range handled by GDB?
I am interested in how a hardware watchpoint is realized.
On platforms which support hardware watchpoints, GDB uses the processor features which enable them (duh!).
For example, on x86, there are special debug registers, which can be programmed to have the processor halt execution when the address bus matches the given address and the access is a write (used for watchpoints), a read (used to implement access watchpoints), etc.
If I watch for a uint8_t variable ... any data is changed on any bit within the range of the memory location it will be detected?
The x86 processors do not write single bits into memory. The least you can write is a byte. The processor will stop after writing the byte (if the debug registers are so configured). GDB can then compare the new value with the old one, and if they differ, stop execution (otherwise GDB will swallow the watchpoint and continue as of nothing happened).
Yes, this can work on single-bit changes -- all it requires is for GDB to remember the old value (which it does when you set up the breakpoint).
Note: there are some gaps in this -- if the value is changed by the kernel (e.g. as a result of read system call), GDB watchpoint will not stop when that happens.
Windows 10, x64 , x86
My current knowledge
Lets say it is quad core, there will be 4 individual program counters which will point to 4 different locations of code for parallel execution.
Each of this program counters indicates where a computer is in its program sequence.
The address it points to changes after a context switch where another threads program counter gets placed onto the program counter to execute.
What I want to do:
Im in Kernel Mode my thread is running on core 1 and I want to read the current instruction pointer of core 2.
Expected Results:
0x203123 is the address of the instruction pointer and this address belongs to this thread and this thread belongs to this process... etc.
Anyone knows how to do it or can give me good book references, links etc...
Although I don't believe it's officially documented, there is a ZwGetContextThread exported from ntdll.dll. Being undocumented, things can change (and I haven't tried it in quite a while) but at least when I last tried it, you called it with a thread handle and a pointer to a CONTEXT structure, and it would return that thread's context.
I'm not certain exactly how up-to-date that is though. It's never mattered to me, so I haven't checked, but my guess would be that the IP in the CONTEXT you get is whatever was saved the last time the thread was suspended. So, if you want something (reasonably) current, you'd use ZwSuspendThread, get the context, then ZwResumeThread to start it running again.
Here I suppose I'm probably supposed to give the standard lines about undocumented function being subject to change, using them being a bad idea, and that you should generally leave all of this alone. Ah well, I been disappointing teachers and other authority figures for years, and I guess I'm not changing right now.
On the other hand, there may be a practical problem here. If you really need data that's really current, this probably isn't going to work very well for you. What it gives you will be kind of current at best. On the other hand, really current is almost a meaningless concept with information that goes out of date every clock cycle.
Anyone knows how to do it or can give me good book references, links etc...
For 80x86 hardware (regardless of operating system); there are only 3 ways to do this (that I know of):
a) send an inter-processor interrupt to the other CPU, and have an interrupt handler that stores the "return EIP" (from its stack) at a known address in memory so that your CPU can read "value of EIP immediately before interrupt" (with synchronization so that your CPU doesn't read before the value is written, etc).
b) put the other CPU into some kind of "debug mode" (single-stepping, last branch recording, ...) so that (either code in a debug exception handler or the CPU's hardware itself) is constantly writing EIP values to memory that you can read.
Of course both of these options will ruin performance, and the value you get will probably be useless (because EIP would've changed after you obtain it but before you can use the obtained value). To ensure the value is still useful; you'd need the other CPU to wait until after you've consumed the obtained value (and are ready for the next value); and to do that you'd have to resort to single-step debugging facilities (with the waiting in the debug exception handler), where you'll be lucky if you can get performance better than a thousand times slower (and can probably improve performance by simply disabling other CPUs completely).
Also note that they still won't accurately tell you EIP in all cases (e.g. if the CPU is in SMM/System Management Mode and is beyond the control of the OS); and I doubt Windows kernel supports any of it (e.g. kernel should support single-stepping of user-space processes/threads to allow debuggers to work, but won't support single-stepping of kernel and will probably lock up the computer due to various "waiting for lock to be released for 6 days" problems).
The last of the 3 options is:
c) Run the OS inside an emulator/simulator instead of running it on real hardware. In that case you can probably modify the emulator/simulator's code to inject EIP values somewhere (maybe some kind of virtual "EIP reporting device"?). This will ruin performance of the emulator/simulator, but you may be able to hide that (e.g. "virtual time inside the emulator passes at a rate of one second per 1000 seconds of real time outside the emulator").
In the past, I have been debugging executables loaded in the internal SRAM of my Cortex M3 (STM32F2) without problems. I have recently been loading my executable to Flash (because of size issues).
Ever since, debugging with GDB has not been working. As I understand, when the executable is in Flash, only hardware breakpoint can be used (as opposed to software breakpoints), and I have six hardware breakpoints. However, when setting just one hardware breakpoint GDB yields an error message:
(gdb) break main
Breakpoint 1 at 0x800019a: file src/main.c, line 88.
(gdb) c
Continuing.
Note: automatically using hardware breakpoints for read-only addresses.
(gdb) Warning:
Cannot insert hardware breakpoint 1.
Could not insert hardware breakpoints:
You may have requested too many hardware breakpoints/watchpoints.
What could be going wrong? Have my hardware breakpoints be taken in the background?
Note: I used OpenOCD to load the executable through JTAG.
So, there's basically two ways (plus one really bad way) that breakpoints can be implemented on any given debugger/platform combination:
Use some hardware capabilities ("hardware breakpoints") to cause the processor to trap when it hits a particular address. This is typically restricted to just a couple of breakpoints, if it's available at all.
For each breakpoint that's being set, replace the instruction at the breakpoint with a "trap" instruction of some variety (i.e, an instruction that will break into the debugger). When one of the breakpoints is hit, swap the original instruction back in and single-step once to make it run.
Single-step through the whole program. This one doesn't really count, because it's horrifically slow.
It sounds as though your debugger is only using method #2 ("software breakpoints"). The gotcha with this method is that it requires the program to be writable -- and flash memory isn't writable one instruction at a time, so this technique won't work.
For example, can I break on any change to memory in an address range from <startaddress> to <endaddress>?
How about reads and/or writes?
On Linux/x86, GDB uses the processor debug registers to implement hardware watchpoints. Such watchpoints are fast -- the program runs at full speed, until the processor stops and signals the application when the access or write watchpoint is triggered.
But such watchpoints can only work on 1-word sized data.
Recent Valgrind versions (SVN, but no released versions) implement GDB remote protocol stub, and allow you to set read or write watchpoints over arbitrary memory via special monitor commands.
So if you are on a platform that has Valgrind, and if your application runs acceptably fast under Valgrind, then yes: you can set watchpoints on arbitrary memory regions.
I want to know how does gdb work internally.
e.g. I know a brief idea that it makes use of ptrace() system call to monitor traced program.
But I want to know how it handles signals, how it inserts new code, and other such fabulous things it does.
Check out the GDB Internals Manual, which covers some of the important aspects. There's also an older PDF version of this document.
From the manual:
This document documents the internals of the GNU debugger, gdb. It includes description of gdb's key algorithms and operations, as well as the mechanisms that adapt gdb to specific hosts and targets.
Taken from gdbint.pdf:
It can be done either as hardware breakpoints or as software
breakpoints:
Hardware breakpoints are sometimes available as a builtin debugging features with some chips. Typically these work by having dedicated
register into which the breakpoint address may be stored. If the PC
(shorthand for program counter) ever matches a value in a breakpoint
registers, the CPU raises an exception and reports it to GDB.
Another possibility is when an emulator is in use; many emulators include circuitry that watches the address lines coming out from the
processor, and force it to stop if the address matches a breakpoint's
address.
A third possibility is that the target already has the ability to do breakpoints somehow; for instance, a ROM monitor may do its own
software breakpoints. So although these are not literally hardware
breakpoints, from GDB's point of view they work the same;
Software breakpoints require GDB to do somewhat more work. The basic theory is that GDB will replace a program instruction with a trap,
illegal divide, or some other instruction that will cause an
exception, and then when it's encountered, GDB will take the exception
and stop the program. When the user says to continue, GDB will restore
the original instruction, single-step, re-insert the trap, and
continue on.
The only way you'll find out is by studying the source.
You can also build it and debug it with itself. Step through the code, and you'll know exactly how it does what it does.
Reading GDB source is not for the faint of heart though -- it is chock-full of macros, and heavily uses libbfd, which itself is hard to understand.
It has to, because it is portable (and in particular, builds and works on platforms which do not have ptrace() at all).