Searching for thread start parameters at top of stack - c++

I've inherited some code that worked on Windows 2000 thats using a small piece of assembly code to locate the base address of the stack, then it uses an offset to grab the parameter value passed to the thread start function.
However this doesnt work in Windows 2008 Server. The offset is obviously different.
#define TEB_OFFSET 4
DWORD * pStackBase;
__asm { mov eax,fs:[TEB_OFFSET]}
__asm { mov pStackBase,eax}
// Read the parameter off the stack
#define PARAM_0_OF_BASE_THEAD_START_OFFSET -3
g_dwCtrlRoutineAddr = pStackBase[PARAM_0_OF_BASE_THEAD_START_OFFSET];
After experimenting, I modified the code to look up the stack till it finds the first non-NULL value. hack
DWORD* pStack = pStackBase;
do
{
pStack--;
}
while (*pStack == NULL);
// Read the parameter off the stack
g_dwCtrlRoutineAddr = *pStack;
Its works! But I want a 'correct' solution.
Does anyone know a safer/better solution for getting the parameter passed to the starting function of a thread on Windows 2008 Server?
The thread start function is ntdll!_RtlUserThreadStart
And the first parameter I'm trying to locate is the address of the function kernel32!CtrlRoutine

Bizarre. Looking with the debugger, the first non-null value on the thread's stack is the value of the argument, the 4th argument to CreateThread. Which gets handed to you on a silver platter when you write the thread procedure like this:
DWORD WINAPI threadProc(void* arg) {
// arg is the value you are looking for
// etc..
}
Not sure how that's related to "kernel32!CtrlRoutine" unless this is a thread used in a service.

To get address of kernel32!CtrlRoutine you can get it by RVA, using a table with all (kernel32 version, CtrlRoutine RVA) pairs. It's the most reliable way.

This is also discussed http://www.latenighthacking.com/projects/2003/sendsignal/

Related

linux c++: libaio callback function never called?

I'm on ubuntu 16.10 with g++ 6.2, testing libaio feature:
1. I was trying to test io_set_callback() function
2. I was using main thread and a child thread to talk by a pipe
3. child thread writes periodically (by alarm timer, signal), and main thread reads
I hope to use "callback" function to receive notifications. It didn't work as expected: callback function "read_done" is never called
My questions:
1. I expected my program should call "read_done" function, but actually not.
2. Why the output prints 2 "Enter while" each time?
I hope it only print together with "thread write msg:..."
3. I tried to comment out "io_getevents" line, same result.
I'm not sure if callback mode still need io_getevents? So how to fix my program so it work as I expected? Thanks.
You need to integrate io_queue_run(3) and io_queue_init(3) into your program. Though these aren't new functions, they don't seem to be in the manpages for a bunch of currently shipping distros. Here's a couple of the manpages:
http://manpages.ubuntu.com/manpages/precise/en/man3/io_queue_run.3.html
http://manpages.ubuntu.com/manpages/precise/en/man3/io_queue_init.3.html
And, of course, the manpages don't actually say it, but io_queue_run is what calls the callbacks that you set in io_set_callback.
UPDATED: Ugh. Here's the source for io_queue_run from libaio-0.3.109 on Centos/RHEL (LGPL license, Copyright 2002 Red Hat, Inc.)
int io_queue_run(io_context_t ctx)
{
static struct timespec timeout = { 0, 0 };
struct io_event event;
int ret;
/* FIXME: batch requests? */
while (1 == (ret = io_getevents(ctx, 0, 1, &event, &timeout))) {
io_callback_t cb = (io_callback_t)event.data;
struct iocb *iocb = event.obj;
cb(ctx, iocb, event.res, event.res2);
}
return ret;
}
You'd never want to actually call this without the io_queue_wait call. And, the io_queue_wait call is commented out in the header included with both Centos/RHEL 6 and 7. I don't think you should call this function.
Instead, I think you should incorporate this source into your own code, then modify it to do what you want. You could pretty trivially add a timeout argument to this io_queue_run and just replace your call to io_getevents with it, instead of bothering with io_queue_wait. There's even a patch here that makes io_queue_run MUCH better: https://lwn.net/Articles/39285/).

GetThreadContext registers always return 0xCCCCCCCC

I am trying to utilize GetThreadContext to view what the current debug registers are set to. No matter what program I debug, it returns 0xCCCCCCCC. I am able to successfully set breakpoints ctx.Dr0 and then catch these breaks with a custom exception handler, but if I try to view the address stored at ctx.Dr0, it always appears as 0xCCCCCCCC. Why is that?
Thanks
CONTEXT ctx;
GetThreadContext(GetCurrentThread(),&ctx);
cout << hex << ctx.Eip << endl;
EDIT**
I think I did not ask my question well enough, because at the time I did not realize the error in my thinking. I actually was trying to call GetThreadContext from within the thread which I wanted to get it's context. This doesn't work for obvious reasons. Instead I think CONTEXT ctx = {CONTEXT_FULL} works. The answer that was most helpful was Hans Passant comment below.
You cannot get a valid context for a running thread. You need to suspend the thread you want to get the context for. So, trying to do it in a current thread is not going to work. This is clearly stated in the GetThreadContext() documentation:
You cannot get a valid context for a running thread. Use the SuspendThread function to suspend the thread before calling GetThreadContext.
If you call GetThreadContext for the current thread, the function returns successfully; however, the context returned is not valid.
On 64-bit platforms you can use RtlCaptureContext. This isn't however available to 32-bit platforms, so there you need a different solution. A possible approach is to use assembly as described in this blog post Walking the stack of the current thread.
CONTEXT Context;
ZeroMemory( &Context, sizeof( CONTEXT ) );
Context.ContextFlags = CONTEXT_CONTROL;
__asm
{
Label:
mov [Context.Ebp], ebp;
mov [Context.Esp], esp;
mov eax, [Label];
mov [Context.Eip], eax;
}

winDBG .call equivalent in visual studio

For a homemade debugging tool built as a VS add-in, I need to:
break at some arbitrary point in my application
call into another method and break there (without adding code in that spot before runtime)
run some other command from my VS add-in at that second breakpoint
My first instinct on how to do this hit a wall at Hans' excellent answer here.
My second idea would be to set up the call to the other method from the breakpoint and have it execute when the application is allowed to continue (if you can see another way to do what I need, feel free to point it out, though !).
This would be trivial with WinDBG : just use .call and go. Unfortunately, I need to do this in Visual Studio.
Hence my question : is there some way to do this in VS ? I cannot find an equivalent to .call, nor a way to manipulate the registers and stack and emulate .call myself.
After some investigation, I believe the answer to this question is: there is no equivalent to .call in VS.
The only solution is to emulate the behavior of .call yourself by manipulating the stack pointer, instruction pointer, etc. This will obviously have limitations, e.g. mine will only work for the Microsoft x64 calling convention. Conversion to x86 and its myriad of calling conventions is left as an exercise for the reader ;)
Depending on your actual need, I have found two ways to do this. Remember, this is for calling into a function the next time the debuggee runs (so that you can break into it, since nested breakpoints are not supported). If you just need to call a function without breaking again, you are much better off just using the Immediate window to call it directly !
The easy way:
This will trick VS into thinking the current frame is in a DLL and method of your choosing. This is useful is the Expression Evaluator does not want to work in the DLL you are stopped in and needs to be in a different one.
WARNING: You cannot actually execute the method call you are faking without corrupting your stack (unless the method you are calling is very simple and you are very lucky).
Use the following to do this directly in the debugger via the Immediate window:
#rsp=#rsp-8
*((__int64*)$rsp)=#rip
#rip={,,<DLL to jump in.dll>}<method to call>
Now VS sees the DLL and method you specified as your current frame. Once you are done, use the following to return to the previous state:
#rip=*((__int64*)$rsp)
#rsp=#rsp+8
This can also be automated in a VS add-in by running these statements through EnvDTE.Debugger.GetExpression(), as demonstrated with the other method below.
The hard way:
This will work for actually calling the DLL and function you want and later returning from it cleanly. It is more complicated and more dangerous; any mistake will corrupt your stack.
It is also harder to get right for both debug and release mode, since the optimizer might have done complex things you were not expecting with the code of your callee and caller.
The idea is to emulate the Microsoft x64 calling convention (documented here) and break in the function called. We need to do the following things:
push parameters beyond the first 4 to the stack, in right to left order
create the shadow space on the stack(1)
push the return address, i.e. the current value of RIP
set RIP to the address of the function to call, just like above
save all the registers that the callee may change, and the caller might not expect to change. This basically means saving everything marked 'volatile'here.
set a breakpoint in the callee
run the debuggee
when the debuggee breaks again, perform whatever operations we want
step out
restore the saved registers
return RSP to the correct location (i.e. tear down the shadow space)
remove the breakpoint
(1) 32 bytes of scratch space for the callee to spill the first 4 arguments that are passed by registers (usually; the callee can actually use this however it likes).
Here is a simplified chunk of my VS addin to do this for a very basic case (non member function taking one parameter set to 0 and not touching too many registers). Anything beyond this is again left as an exercise for the reader ;)
EnvDTE90a.Debugger4 dbg = (EnvDTE90a.Debugger4)DTE.Debugger;
string method = "{,,dllname.dll}function";
string RAX = null, RCX = null, flags = null;
// get the address of the function to call and the address to break at (function address + a bit, to skip some prolog and help our breakpoint actually hit)
Expression expr = dbg.GetExpression3(method, dbg.CurrentThread.StackFrames.Item(1), false, false, false, 0);
string addr = expr.Value;
string addrToBreak = (UInt64.Parse(addr.Substring(2), NumberStyles.HexNumber) + 2).ToString();
if (!expr.IsValidValue)
return;
// set a breakpoint in the function to jump into
EnvDTE.Breakpoints bpsAdded = dbg.Breakpoints.Add("", "", 0, 0, "", dbgBreakpointConditionType.dbgBreakpointConditionTypeWhenTrue, "c++", "", 0, addrToBreak, 0, dbgHitCountType.dbgHitCountTypeNone);
if (bpsAdded.Count != 1)
return;
// set up the shadow space and parameter space
// NB: for 1 parameter : 4 words of shadow space, no further parameters... BUT, since the stack needs to be 16 BYTES aligned (i.e. 2 words) and the return address takes a single word, we need to offset by 5 !
dbg.GetExpression3("#rsp=#rsp-8*5", dbg.CurrentStackFrame, false, true, false, 0);
// set up the return address
dbg.GetExpression3("#rsp=#rsp-8*1", dbg.CurrentStackFrame, false, true, false, 0);
dbg.GetExpression3("*((__int64*)$rsp)=#rip", dbg.CurrentStackFrame, false, true, false, 0);
// save the registers
RAX = dbg.GetExpression3("#rax", dbg.CurrentStackFrame, false, true, false, 0).Value;
RCX = dbg.GetExpression3("#rcx", dbg.CurrentStackFrame, false, true, false, 0).Value;
// save the flags
flags = dbg.GetExpression3("#efl", dbg.CurrentStackFrame, false, true, false, 0).Value;
// set up the parameter for the call
dbg.GetExpression3("#rcx=0x0", dbg.CurrentStackFrame, false, true, false, 0);
// set the instruction pointer to our target function
dbg.GetExpression3("#rip=" + addr, dbg.CurrentStackFrame, false, true, false, 0);
dbg.Go(true);
// DO SOMETHING USEFUL HERE ! ;)
dbg.StepOut(true);
// restore all registers
dbg.GetExpression3("#rax=" + RAX, dbg.CurrentStackFrame, false, true, false, 0);
dbg.GetExpression3("#rcx=" + RCX, dbg.CurrentStackFrame, false, true, false, 0);
// restore flags
dbg.GetExpression3("#efl=" + flags, dbg.CurrentStackFrame, false, true, false, 0);
// tear down the shadow space
dbg.GetExpression3("#rsp=#rsp+8*5", dbg.CurrentStackFrame, false, true, false, 0);
}

Trying to print the registers' values from the stack using a pin tool

I am trying to print out the stack in different routines using a pin tool. I am able to get all of the routines but I am a little confused on how to get the addresses stored in the registers in the stack of that routine.
What I have is this:
VOID SETRTN_CONTEXT(CONTEXT * ctxt)
{
ADDRINT reg_address;
PIN_SaveContext(ctxt, &m_ctxt);
reg_address = PIN_GetContextReg(&m_ctxt, REG_STACK_PTR);
}
and in another function I have this piece of code that calls that function:
for(rtn = SEC_RtnHead(sec); RTN_Valid(rtn); rtn = RTN_Next(rtn) )
{
RTN_Open(rtn);
RTN_InsertCall(rtn, IPOINT_BEFORE, (AFUNPTR)SETRTN_CONTEXT,
IARG_CONST_CONTEXT, IARG_THREAD_ID, IARG_END);
RTN_Close(rtn);
}
I am a little confused on when the routine calls that function since I am only getting one result and I get it after attaching with Pin and waiting a couple of seconds.
Any pinheads that might help me on this one? I understand that I need the context from a routine in order to get the registers but I cannot find any function that returns the context as an object...
In your RTN_InsertCall, you add the thread id, and in your SETRTN_CONTEXT function declaration you don't receive the thread id... might want to fix that.
Also, in your analysis routine SETRTN_CONTEXT, you're not actually saving anything external to the application. I could be wrong if m_ctxt is a global variable that you're manipulating elsewhere, which how could that be sound unless you did that for every time the analysis routine ran and in a thread safe way?
Clearly, you want to write the information to some file or output. I recommend using some kind of xml tool; this makes it easy to parse, and if you write your pintools smartly, you can exchange the format of the output by obeying some interface contract.
Also to clarify your confusion, you try to insert the analysis routine to run before every single function in a particular image; every time that function is called in that image, your SETRTN_CONTEXT runs.

NtQuerySystemInformation Hook Failure

After successfully building a trampoline and learning more about process memory space, I tested the trampoline on MessageBoxA. It worked perfectly so I decided to finally put the code to the use it was supposed to be for in the first place, hiding a process by hooking NtQuerySystemInformation. The redirect function should work fine, but the code I used to write the jmp instruction now causes the task manager to crash every time.
BYTE tmpJMP[5] = {0xE9,0x00,0x00,0x00,0x00}; //jmp,A,D,D,R
memcpy(JMP,tmpJMP,5);
DWORD Addr = ((DWORD)func - ((DWORD)oNtQuerySystemInformation + 0x5));
for (int i=0;i<4;++i)
JMP[i+1] = ((BYTE*)&Addr)[i];
if (VirtualProtect((LPVOID)oNtQuerySystemInformation,5,PAGE_EXECUTE_READWRITE,&oldProtect) == FALSE)
MessageBox(NULL,L"Error unprotecting memory",L"",MB_OK);
memcpy(oldBytes,(LPVOID)oNtQuerySystemInformation,5);
if (!WriteProcessMemory(GetCurrentProcess(),(LPVOID)oNtQuerySystemInformation,(LPCVOID)JMP,5,NULL))
MessageBox(NULL,L"Unable to write to process memory space",L"",MB_OK);
VirtualProtect((LPVOID)oNtQuerySystemInformation,5,oldProtect,NULL);
FlushInstructionCache(GetCurrentProcess(),NULL,NULL);
I'm writing to the memory as such. I can't seem to find a problem with the code. I was thinking that maybe the memory changes from API to API but I was told that's incorrect, making me all out confused. Is there anything wrong that you all see? Please be descriptive. I love to learn o3o