How to find which thread will execute an instruction? - c++

I'm very surprised this hasn't been asked before. I'm trying to put a breakpoint on a specific instruction and read the registers in an already running process (Following this post: Read eax register).
I found the instruction I'm looking for, however the problem I've been running into is how do I find the right thread where the instruction is going to be executed, so I can do SetThreadContext() on it. This is a multithreaded program, so its not as simple as looking up for the single thread that is associated with the process.
I tried looking through Cheat Engine's source to see how they did it, however I couldn't find much, so I'm wondering how exactly they did it.
One idea that comes to mind is just setting every thread's context to it, however I'd like to avoid that.
EDIT: Forgot to mention I'm trying to do this with hardware breakpoints (using debug registers)

Unless you already know the answer / can predict the future, you need to set a hardware breakpoint in every thread that might run the instruction you care about.
The debug registers are per-core (and thus per-thread with context-switching), so a core will only actually break if the thread it's executing has its debug registers set to break on that instruction.
It might be easier to use a software breakpoint (0xcc byte replacing the first byte of the instruction) because you just have to store that once and every thread will see it. (x86 has coherent instruction caches; you don't have to invalidate them.)
As Margaret points out, once your breakpoint handler runs, you check the EIP / RIP of every thread, and the ones that are currently at that instruction are the one(s) that have reached the breakpoint and will run that instruction if single-stepped or resumed. (Or an address in your handler, if the handler runs in the context of that thread.)

Related

Implementing breakpoints that resume safely in multithreaded code

I'm writing a debugger and currently trying to make breakpoints work reliably when multiple threads hit them at the same time. As far as I know, most debuggers implement breakpoints by replacing the first byte of the instruction with 0xCC, and that's how I'm currently doing it as well. However, I don't see any way of restoring the original byte while still being able to stop other threads that are about to hit that breakpoint, without halting all running threads. Does anyone have any information on how that's usually achieved? Is halting all threads really the only solution?
With all threads stopped, you restore that byte, step that one thread only for one instruction, recreate the breakpoint, then resume execution of all threads. If you are using one of the limited hardware debug registers, you can use RF to temporarily ignore the breakpoint for one instruction (see below).
Stopping just the one thread during debugging, while the other threads keep running, is just asking for trouble. Consider how you'd handle hitting the same or a different breakpoint while you were stopped at the first? Or if an exception occurs?
On the Intel CPUs, there is a flag that can be set in the EFLAGS register (the Resume Flag, bit 16). When set this will allow executing the first instruction without triggering breakpoints, and will work when using the hardware breakpoints (and not the breakpoint instruction).
Chapter 17 in Volume 3 (the System Programming Guide, available for download from Intel) contains lots of details on the Debug features of Intel IA-32 CPUs.
I'm aware that temporarily pausing all threads is the common way to solve that. I'm asking if there's any way to avoid doing that.
The first thread to hit your int3 software breakpoint is the one that you want to stop.
If other threads hit it before you can patch it back to the correct contents, resume those threads after removing the software breakpoint. (x86 has coherent instruction caches, so you can safely modify a single code byte without other cores needing to run a fence / isync instruction to re-sync their instruction caches with data cache. This is a harder problem on other ISAs.)
Other threads can see a small interruption.
Of course, if the user puts a breakpoint inside a critical section (with a lock held), or single-steps into a critical section, the other threads will block on that. This is also possible for lockless code that isn't lock-free (in the computer science sense).
Examining and modifying memory while other threads are running is potentially risky. Another thread could unmap memory just before you try to read or modify it. As long as your debugger itself doesn't crash, it's up to the user how much of a mess they want to make, though.

How do you stop a thread and flush its registers into the stack?

I'm creating a concurrent memory reclamation algorithm in C++. Periodically, the stacks of executing mutator threads need to be inspected, so that I can see what references the threads are currently holding. In the process of doing this, I need to also check the registers of the mutator thread to check any references that might be in there.
Clearly many JVM's and C# vm's have no problem doing this as part of their garbage collection cycles. However, I haven't been able to find a definitive solution to this issue.
I can't quite tease apart what is going on in the Bohem garbage collector in order to inspect the root set, if you can (or know how its done), I'd really like to know.
Ideally I would be able to cause the mutator thread to be interrupted, and execute a piece of handler code which would report it's PC and also flush any register-based references into the stack, and then perhaps help finish the collection cycle. I believe that most compilers in most systems will automatically flush the registers when interrupt or signal handlers are called, but I'm not clear on the specifics, or how to access that data. It seems that separate stacks might be used for interrupt and signal handlers. Additionally, I can't find any information about how to target a particular thread, or how to send a signal. Windows does not seem to support this form of signaling anyway, and I would like my system to run on both Linux and Windows on x86-64 processors.
Edit: SuspendThread() is used in some situations, although safepoints seem to be preferred. Any ideas on why? Is there any way to deal with long-lasting I/O waits or other waits for kernel code to return?
I thought this was a very interesting question, so I dug into it a bit. It turns out that the Hotspot JVM uses a mechanism called "safepoints" which cause the threads of the JVM to cooperatively all stop themselves so that the GC can begin. In other words, the thread initiating GC doesn't forcibly stop the other threads, the other threads voluntarily suspend themselves by various clever mechanisms.
I don't believe the JVM scans registers, because a safepoint is defined such that all roots are known (I presume this means in memory).
For more information see:
HotSpot Glossary -- which defines safepoints
safepoint.cpp -- the source in HotSpot that implements safepoints
A slide deck that describes safepoints in some detail (look 10 slides or so in)
In regards to your desire to "interrupt" all threads, according to the slide deck I referenced above, thread suspension is "unreliable on Solaris and Linux, e.g., spurious signals." I'm not sure what mechanism even exists for thread suspension that the slides would be referring to.
On windows you should be able to get this done use SuspendThread (and ResumeThread) along with GetThreadContext (as Hans mentioned). All of these functions take handles to the specific thread you intend to target.
To get a list of all threads in the current process, see this(toolhlp32 works on x64, despite its bad naming scheme...).
As a point of interest, one way to flush registers to the stack on x86 is to use the PUSHAD assembly instruction.

stepping through program with debugger takes a long time

When I debug my program by stepping through it, it sometimes takes a long time for the step to finish. This was not happening in the beginning of the project so most likely it is due to something I have added. Could you give me pointers as to how to remedy this. I did notice one of the problems was due to the main thread trying to paint a widget. My application is multi-threaded (1 background thread and 1 main thread) so I am wondering if it has something to do with that. Your comments are appreciated.
With gdb just set scheduler-locking mode to desired behaviour.
In this case: "The step mode optimizes for single-stepping. It stops other threads from "seizing the prompt" by preempting the current thread while you are stepping. Other threads will only rarely (or never) get a chance to run when you step."
A guess: Is your "background thread" pegged at near 100% CPU utilization?
Between lines of of your main thread, while stepping, the debugger is going to allow the background thread to also "step". If the background thread is pegged it can be running a lot more than a few instructions, causing things to appear unresponsive.
Probably if your second thread is doing that much computation continuously it indicates you've got another problem in your application that you need to fix. If you get that thread under control you will probably see your debugger handling things a lot better.
I asked a very similar question regarding visual studio: VS2010 debugger takes an unreasonable amount of time
No real answer came about. You'll find similar questions for past versions of the IDE here as well.

Hibernating/restarting a thread

I'm looking for a way to restart a thread, either from inside that thread's context or from outside the thread, possibly from within another process. (Any of these options will work.) I am aware of the difficulty of hibernating entire processes, and I'm pretty sure that those same difficulties attend to threads. However, I'm asking anyway in the hopes that someone has some insight.
My goal is to pause, save to file, and restart a running thread from its exact context with no modification to that thread's code, or rather, modification in only a small area - i.e., I can't go writing serialization functions throughout the code. The main block of code must be unmodified, and will not have any global/system handles (file handles, sockets, mutexes, etc.) Really down-and-dirty details like CPU registers do not need to be saved; but basically the heap, stack, and program counter should be saved, and anything else required to get the thread running again logically correctly from its save point. The resulting state of the program should be no different, if it was saved or not.
This is for a debugging program for high-reliability software; the goal is to run simulations of the software with various scripts for input, and be able to pause a running simulation and then restart it again later - or get the sim to a branch point, save it, make lots of copies and then run further simulations from the common starting point. This is why the main program cannot be modified.
The main thread language is in C++, and should run on Windows and Linux, however if there is a way to only do this on one system, then that's acceptable too.
Thanks in advance.
I think what you're asking is much more complicated than you think. I am not too familiar with Windows programming but here are some of the difficulties you'll face in Linux.
A saved thread can only be restored from the root process that originally spawned the thread, otherwise the dynamic libraries would be broken. Because of this saving to disk is essentially meaningless. The reason is dynamic libraries are loaded at different address each time they're loaded. The only way around this would be to take complete control of dynamically linking, no small feat. It's possible, but pretty scary.
The suspended thread will have variables in the the heap. You'd need to be able to find all globals 'owned' by the thread. The 'owned' state of any piece of the heap cannot be determined. In the future it may be possible with the C++0x's garbage collection ABI. You can't just assume the whole stack belongs to the thread to be paused. The main thread uses the heap when creating threads. So blowing away the heap when deserializing the paused thread would break the main thread.
You need to address the issues with globals. And not just the globals from created in the threads. Globals (or statics) can and often are created in dynamic libraries.
There are more resources to a program than just memory. You have file handles, network sockets, database connections, etc. A file handle is just a number. serializing its memory is completely meaningless without the context of the process the file was opened in.
All that said. I don't think the core problem is impossible, just that you should consider a different approach.
Anyway to try to implement this the thread to paused needs to be in a known state. I imagine the thread to be stoped would call a library function meant the halt the process so it could be resumed.
I think the linux system call fork is your friend. Fork perfectly duplicates a process. Have the system run to the desired point and fork. One fork wait to fork others. The second fork runs one set of input.
once it completes the first fork can for again. Again the second fork can run another set of input.
continue ad infinitum.
Threads run in the context of a process. So if you want to do anything like persist a thread state to disk, you need to "hibernate" the entire process.
You will need to serialise the entire set of the processes data. And you'll need to store the current thread execution point. I think serialising the process is do-able (check out boost::serialize) but the thread stop point is a lot more difficult. I would put places where it can be stopped through the code, but as you say, you cannot modify the code.
Given that problem, you're looking at virtualising the platform the app is running on, and using its suspend functionality to pause the entire thing. You might find more information about how to do this in the virtualisation vendor's features, eg Xen.
As the whole logical address space of the program is part of the thread's context, you would have to hibernate the whole process.
If you can guarantee that the thread only uses local variables, you could save its stack. It is easy to suspend a thread with pthreads, but I don't see how you could access its stack from outside then.
The way you would have to do this is via VM Snapshots; get a copy of VMWare Workstation, then you can write code to automate starting/stopping/snapshotting the machine at different points. Any other approach is pretty untenable, as while you might be able to freeze and dethaw a process, you can't reconstruct the system state it expects (all the stuff that Caspin mentions like file handles et al.)

Possible to trap write to address (x86 - linux)

I want to be able to detect when a write to memory address occurs -- for example by setting a callback attached to an interrupt. Does anyone know how?
I'd like to be able to do this at runtime (possibly gdb has this feature, but my particular
application causes gdb to crash).
If you want to intercept writes to a range of addresses, you can use mprotect() to mark the memory in question as non-writeable, and install a signal handler using sigaction() to catch the resulting SIGSEGV, do your logging or whatever and mark the page as writeable again.
What you need is access to the X86 debug registers: http://en.wikipedia.org/wiki/Debug_register
You'll need to set the breakpoint address in one of DR0 to DR3, and then the condition (data write) in DR7. The interrupt will occur and you can run your debug code to read DR6 and find what caused the breakpoint.
If GDB doesn't work, you might try a simpler/smaller debugger such as http://sourceforge.net/projects/minibug/ - if that isn't working, you can at least go through the code and understand how to use the debugging hardware on the processor yourself.
Also, there's a great IBM developer resource on mastering linux debugging techniques which should provide some additional options:
http://www.ibm.com/developerworks/linux/library/l-debug/
A reasonably good article on doing this is windows is here (I know you're running on linux, but others might come along to this question wanting to do it in windows):
http://www.codeproject.com/KB/debug/hardwarebreakpoint.aspx
-Adam
GDB does have that feature: it is called hardware watchpoints, and it is very well supported on Linux/x86:
(gdb) watch *(int *)0x12345678
If your application crashes GDB, build current GDB from CVS Head.
If that GDB still fails, file a GDB bug.
Chances are we can fix GDB faster than you can hack around SIGSEGV handler (provided a good test case), and fixes to GDB help you with future problems as well.
mprotect does have a disadvantage: your memory must be page-boundary aligned. I had my problematic memory on the stack and was not able to use mprotect().
As Adam said, what you want is to manipulate the debug registers. On windows, I used this: http://www.morearty.com/code/breakpoint/ and it worked great. I also ported it to Mach-O (Mac OS X), and it worked great, too. It was also easy, because Mach-O has thread_set_state(), which is equivalent to SetThreadContext().
The Problem with linux is that it doesn't have such equivalents. I found ptrace, but I thought, this can't be it, there must be something simpler. But there isn't. Yet. I think they are working on a hw_breakpoint API for both kernel and user space. (see http://lwn.net/Articles/317153/)
But when I found this: http://blogs.oracle.com/nike/entry/memory_debugger_for_linux I gave it a try and it wasn't that bad. The ptrace method works by some "outside process" acting as a "debugger", attaching to your program, injecting new values for the debug registers, and terminating with your program continuing with a new hw breakpoint set. The thing is, you can create this "outside process" yourself by using fork(), (I had no success with a pthread), and doing these simple steps inline in your code.
The addwatchpoint code must be adapted to work with 64 bit linux, but that's just changing USER_DR7 etc. to offsetof(struct user, u_debugreg[7]). Another thing is that after a PTRACE_ATTACH, you have to wait for the debuggee to actually stop. But instead of retrying a POKEUSER in a busy loop, the correct thing to do would be a waitpid() on your pid.
The only catch with the ptrace method is that your program can have only one "debugger" attached at a time. So a ptrace attach will fail if your program is already running under gdb control. But just like the example code does, you can register a signal handler for SIGTRAP, run without gdb, and when you catch the signal, enter a busy loop waiting for gdb to attach. From there you can see who tried to write your memory.