Listing all running applications MASM32 Assembly - c++

Good day! I've been trying to list all the currently running applications and write it on a text file using masm. I'm new with assembly but is using MSDN as my reference. So far, I know how to use CreateFile, WriteFile, ReadFile and others but I don't get how Process32First works.
I'm trying to convert the code in this link to MASM, (https://msdn.microsoft.com/en-us/library/windows/desktop/ms686701(v=vs.85).aspx) but with no luck, I can't get any output.
I will really appreciate any help! Thank you! Have a nice day.
include \masm32\include\masm32rt.inc
.data
pe32 PROCESSENTRY32 <>
errorCreateTool db "ERROR: CreateToolhelp32Snapshot", 0
errorPF db "ERROR: Process32First", 0
errorOP db "ERROR: OpenProcess", 0
yesMsg db "proceed", 0
.data?
dwPriorityClass dd ?
hProcessSnap HANDLE ?
hProcess HANDLE ?
.code
_start:
push 0
push TH32CS_SNAPPROCESS
call CreateToolhelp32Snapshot
mov hProcessSnap, eax
cmp hProcessSnap, INVALID_HANDLE_VALUE
je _errorCT
mov pe32.dwSize, sizeof PROCESSENTRY32
push offset pe32
push hProcessSnap
call Process32FirstW
cmp eax, ERROR_NO_MORE_FILES
je _errorPF
push offset pe32.szExeFile
call StdOut
mov dwPriorityClass, 0
push offset pe32.th32ProcessID
push FALSE
push PROCESS_ALL_ACCESS
call OpenProcess
cmp eax, 00H ;if I comment this out, the code will proceed
je _errorOpen
push offset pe32.th32ProcessID ;but this doesn't have any value and doesn't print out
call StdOut
push offset yesMsg ;while this prints out on the console
call StdOut
jmp _done
_errorOpen:
push offset errorOP
call StdOut
jmp _done
_errorPF:
push offset errorPF
call StdOut
jmp _done
_errorCT:
push offset errorCreateTool
call StdOut
_done:
push 0
call ExitProcess
end _start

I have experienced using that function. All I have to do is updata my kernel32.inc and kernel32p.inc as you have suggested. After doing those things, I run the makelibs.bat in the masm32 folder and it works from there.

Related

add your own instructions using pin

Is it possible to add your own code in the code generated by intel-pin?
I was wondering this for a while, I created a simple tool:
#include <fstream>
#include <iostream>
#include "pin.H"
// Additional library calls go here
/*********************/
// Output file object
ofstream OutFile;
//static uint64_t counter = 0;
uint32_t lock = 0;
uint32_t unlock = 1;
std::string rtin = "";
// Make this lock if you want to print from _start
uint32_t key = unlock;
void printmaindisas(uint64_t addr, std::string disassins)
{
std::stringstream tempstream;
tempstream << std::hex << addr;
std::string address = tempstream.str();
if (key)
return;
if (addr > 0x700000000000)
return;
std::cout<<address<<"\t"<<disassins<<std::endl;
}
void mutex_lock()
{
key = !lock;
std::cout<<"out\n";
}
void mutex_unlock()
{
key = lock;
std::cout<<"in\n";
}
void Instruction(INS ins, VOID *v)
{
//if
// Insert a call to docount before every instruction, no arguments are passed
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)printmaindisas, IARG_ADDRINT, INS_Address(ins),
IARG_PTR, new string(INS_Disassemble(ins)), IARG_END);
//std::cout<<INS_Disassemble(ins)<<std::endl;
}
void Routine(RTN rtn, VOID *V)
{
if (RTN_Name(rtn) == "main")
{
//std::cout<<"Loading: "<<RTN_Name(rtn) << endl;
RTN_Open(rtn);
RTN_InsertCall(rtn, IPOINT_BEFORE, (AFUNPTR)mutex_unlock, IARG_END);
RTN_InsertCall(rtn, IPOINT_AFTER, (AFUNPTR)mutex_lock, IARG_END);
RTN_Close(rtn);
}
}
KNOB<string> KnobOutputFile(KNOB_MODE_WRITEONCE, "pintool", "o", "mytool.out", "specify output file name");
/*
VOID Fini(INT32 code, VOID *v)
{
// Write to a file since cout and cerr maybe closed by the application
OutFile.setf(ios::showbase);
OutFile << "Count " << count << endl;
OutFile.close();
}
*/
int32_t Usage()
{
cerr << "This is my custom tool" << endl;
cerr << endl << KNOB_BASE::StringKnobSummary() << endl;
return -1;
}
int main(int argc, char * argv[])
{
// It must be called for image instrumentation
// Initialize the symbol table
PIN_InitSymbols();
// Initialize pin
if (PIN_Init(argc, argv)) return Usage();
// Open the output file to write
OutFile.open(KnobOutputFile.Value().c_str());
// Set instruction format as intel
// Not needed because my machine is intel
//PIN_SetSyntaxIntel();
RTN_AddInstrumentFunction(Routine, 0);
//IMG_AddInstrumentFunction(Image, 0);
// Add an isntruction instrumentation
INS_AddInstrumentFunction(Instruction, 0);
//PIN_AddFiniFunction(Fini, 0);
// Start the program here
PIN_StartProgram();
return 0;
}
If I print the following c code (which does literally nothing):
int main(void)
{}
Gives me this output:
in
400496 push rbp
400497 mov rbp, rsp
40049a mov eax, 0x0
40049f pop rbp
out
And with the following code:
#include <stdio.h>
int main(void)
{
printf("%s\n", "Hello");
}
prints:
in
4004e6 push rbp
4004e7 mov rbp, rsp
4004ea mov edi, 0x400580
4004ef call 0x4003f0
4003f0 jmp qword ptr [rip+0x200c22]
4003f6 push 0x0
4003fb jmp 0x4003e0
4003e0 push qword ptr [rip+0x200c22]
4003e6 jmp qword ptr [rip+0x200c24]
Hello
4004f4 mov eax, 0x0
4004f9 pop rbp
out
So, my question is, is it possible to add:
4004ea mov edi, 0x400580
4004ef call 0x4003f0
4003f0 jmp qword ptr [rip+0x200c22]
4003f6 push 0x0
4003fb jmp 0x4003e0
4003e0 push qword ptr [rip+0x200c22]
4003e6 jmp qword ptr [rip+0x200c24]
instructions in my first code (code with no print function), using pin in the instrumentation routine/ or analysis routine, so that I can imitate the my second code (by dynamically adding those instructions)? (I don't want to call printf directly, but want to imitate the behavior) (in future I was thinking of imitating sanity checker or intel mpx using pin, if I could add these check instructions dynamically in some way)
I looked at pin documentation, it has the instruction modification api, but it can be only used to add direct/ indirect branches or delete instructions (but we can't add add new ones).
An analysis routine (or replacement routine) is really just code inserted into the application being profiled. But it appears to me that you want to modify one or more registers of the application context. By default, when an analysis routine executes, the Pin runtime saves the application context on entrance to the analysis routine and then later restores it when the routine returns. This basically allows the analysis routine to execute without any unintended changes to the application. However, Pin provides three ways to modify the application context in an analysis or replacement routine:
Pass the IARG_RETURN_REGS argument to the routine. The value returned from the routine is stored into the specified register of the application context. This enables you to change any single register whose size does not exceed the size of ADDRINT, which is the return value type of the routine. This is not supported in Probe mode or with the Buffering API1. However, it is the most efficient way to change a single register.
Pass an IARG_REG_REFERENCE argument for each register you want to modify in the routine. For each such argument, you need to add a parameter in the declaration of the routine of type PIN_REGISTER*. This is not supported in Probe mode or with the Buffering API, but it is the most efficient way to change a couple of registers and supports all registers.
Pass the IARG_CONTEXT argument to the routine. You need to add a parameter in the declaration of the routine of type CONTEXT*. Use the context manipulation API to change one or more registers of the application context. For example, you can change the RIP register of the application context using PIN_SetContextReg(ctxt, REG_INST_PTR, NewRipValue). In order for the context changes to take effect, PIN_ExecuteAt must be called, which resumes the execution of the application at the potentially changed RIP with the specified context. This is not supported with the Buffering API and there are restrictions in the Probe mode.
For example, you if you want to execute mov edi, 0x400580 in the application context, you can simply store the value 0x400580 in the EDI register of the application context in your analysis routine:
r->dword[0] = 0x400580;
r->dword[1] = 0x0; // See: https://stackoverflow.com/questions/11177137/why-do-x86-64-instructions-on-32-bit-registers-zero-the-upper-part-of-the-full-6
where r is of type PIN_REGISTER*. Or alternatively:
PIN_SetContextReg(ctxt, REG_EDI, 0x400580); // https://stackoverflow.com/questions/38782709/what-is-the-default-type-of-integral-literals-represented-in-hex-or-octal-in-c
Later when application execution resumes, RDI will contain 0x400580.
Note that you can change any valid memory location in your analysis routine whether it belongs to the application or your Pin tool. For example, if the RAX register of the application context contains a pointer, you can directly access the memory location at that pointer just like any other pointer.
Footnotes:
(1) It seems you're not using the Probe mode or the Buffering API.

Is it possible to transfer thread execution to another thread?

I'm currently experimenting for possibilities transferring a thread execution to another newly created thread from current thread (I hope its a correct word); Here's the illustration:
Thread1 running
Thread1 stop in the middle of the code and create Thread2
Thread2 continue from the middle of the code where Thread1 stop
EDIT: Updated the example.
#include "stdafx.h"
#include <memory>
#include <windows.h>
#include <cassert>
int _eax, _ebx, _ecx, _edx;
int _ebp, _esp, _esi, _edi;
int _eip;
int _flags;
int _jmp_addr;
bool thread_setup = false;
CONTEXT PrevThreadCtx;
HANDLE thread_handle;
int _newt_esp;
int _newt_ret;
DWORD WINAPI RunTheThread(LPVOID lpParam)
{
// 1000 is more than enough, call to CreateThread() should already return by now.
Sleep(1000);
ResumeThread(thread_handle);
return 0;
}
DWORD WINAPI DummyPrologueEpilogue(LPVOID lpParam)
{
return 123;
}
__declspec(naked) void TransferThread(LPVOID lpParam)
{
//longjmp(jmpbuf, 0);=
__asm
{
call get_eip;
cmp[_newt_esp], 0;
mov[_newt_ret], eax;
jz setup_new_thread;
jmp DummyPrologueEpilogue;
get_eip:
mov eax, [esp];
ret;
setup_new_thread:
pushad;
mov[_newt_esp], esp;
mov eax, [_flags];
push eax;
popfd;
mov eax, [_eax];
mov ebx, [_ebx];
mov ecx, [_ecx];
mov edx, [_edx];
mov ebp, [_ebp];
mov esp, [_esp];
mov esi, [_esi];
mov edi, [_edi];
jmp [_eip];
}
}
int _tmain(int argc, _TCHAR* argv[])
{
int x = 100;
char szTest[256];
sprintf_s(szTest, "x = %d", x);
//HideThread();
//setjmp(jmpbuf);
__asm
{
// Save all the register
mov[_eax], eax;
mov[_ebx], ebx;
mov[_ecx], ecx;
mov[_edx], edx;
mov[_ebp], ebp;
mov[_esp], esp;
mov[_esi], esi;
mov[_edi], edi;
push eax;
// Save the flags
pushfd;
pop eax;
mov[_flags], eax;
// If we on *new thread* jmp to end_asm, otherwise continue...
call get_eip;
mov[_eip], eax;
mov al, byte ptr[thread_setup];
test al, al;
jnz end_asm;
mov eax, [jmp_self];
mov[_jmp_addr], eax;
pop eax;
mov[_newt_esp], 0;
mov byte ptr[thread_setup], 1;
push 0;
push CREATE_SUSPENDED;
push 0;
push TransferThread;
push 0;
push 0;
call CreateThread;
mov [thread_handle], eax;
// Create another thread just to resume 'TransferThread()'/*new thread* to give time to
// __stdcall below to return properly, thus restoring the stack.
// So the *new thread* does not accidentally pop the value from stacks or the __stdcall cleanup
// code doesn't accidentally overwrites new pushed value from *new thread*.
push 0;
push 0;
push 0;
push RunTheThread;
push 0;
push 0;
call CreateThread;
// Jump to self, consumes CPU
jmp_self:
jmp jmp_self;
nop;
nop;
jmp end_asm;
get_eip:
mov eax, [esp];
ret;
end_asm:
}
// Test stack-based variable
MessageBoxA(0, szTest, "Hello World!", MB_OK);
assert(x = 100);
x += GetCurrentThreadId();
sprintf_s(szTest, "x = %d", x);
HMODULE hMod = LoadLibrary(TEXT("comctl32"));
FreeLibrary(hMod);
try
{
std::unique_ptr<char[]> pTest(new char[256]);
sprintf_s(pTest.get(), 256, "WinApi call test. Previous loadLibrary() call return %X", hMod);
MessageBoxA(0, pTest.get(), "Hello World!", MB_OK);
} catch (...) {}
char *pszTest = (char*) malloc(256);
if (pszTest)
{
float f = 1.0;
f *= (float) GetCurrentThreadId();
sprintf_s(pszTest, 256, "Current Thread ID = %X, Thread handle = %X, FP Test = %f", GetCurrentThreadId(), GetCurrentThread(), f);
MessageBoxA(0, pszTest, "Hello World!", MB_OK);
free( pszTest );
}
// printf() from *new thread* will fail on stkchk()
//printf("Simple test\n");
// Let's terminate this *new* thread and continue the old thread
if (thread_setup)
{
DWORD OldProtect;
thread_setup = false;
VirtualProtect((PVOID)_jmp_addr, 2, PAGE_EXECUTE_READWRITE, &OldProtect);
*(int*)(_jmp_addr) = 0x90909090; // Prev thread not suspended. Just hope this op is atomic.
// Operation below will change the stack pointer
//VirtualProtect((PVOID)_jmp_addr, 2, OldProtect, &OldProtect);
//FlushInstructionCache(GetCurrentProcess(), (PVOID)_jmp_addr, 2);
__asm {
push eax;
mov eax, jmp_self2;
mov[_jmp_addr], eax;
pop eax;
jmp_self2:
jmp jmp_self2;
nop;
nop;
mov esp, [_newt_esp];
popad;
jmp _newt_ret;
}
}
else
{
DWORD OldProtect;
VirtualProtect((PVOID)_jmp_addr, 2, PAGE_EXECUTE_READWRITE, &OldProtect);
*(int*)(_jmp_addr) = 0x90909090; // Prev thread not suspended. Just hope this op is atomic.
}
// Show both thread can be exited cleanly... with some hacks.
DWORD dwStatus;
while (GetExitCodeThread(thread_handle, &dwStatus) && dwStatus == STILL_ACTIVE) Sleep(10);
printf("*New Thread* exited with status %d (Expected 123), Error=%X\n", dwStatus, GetLastError());
assert(dwStatus == 123);
printf("Test printf from original thread!\n");
printf("printf again!\n");
printf("and again!\n");
Sleep( 1000 );
return 0;
}
The code might be pain to read since it consists mostly asm. So I added a little comment to help. Now that I test, it is quite possible but with some problems. Calling few win api seems fine, but calling printf will certainly crash on stkchk() function (access denied). I will try alternative if there is any suggestion.
It won't be possible. (EDIT: It might be possible to switch successfully with OS APIs like GetThreadContext as JS1 mentionned, but others limitations still apply)
The thing is, the new thread needs the previous thread stack to run. You can do that by either using the old stack directly, or copying the old stack to the new stack. Neither of these are possible : you can't copy the stack because of stack-dependent pointers (frame pointers, for example), and you can't use the old stack, because the OS will detect that the thread went out of its stack, and throw a stack overflow or underflow.
It might be possible if the OS doesn't detect the stack misplacement. If that's the case, then you can load the old ESP and EBP to use the old stack (like you did). You have some problem with your current code (provided it can even work at all), because you push some registers AFTER you saved the stack pointer (ESP). When you reload ESP, it's like you never pushed anything. The ESP pointer really is a special case that need to be handled carefully. Note that you don't even need to care about the new stack in this case, it will just be ignored. That means you don't need any special naked declaration.
Another note, if you are able to do this, neither thread will be able to terminate if you don't restore the threads previous code flows. The old thread should not use the stack while the new is running, so it can't terminate, and the new can't terminate on the old stack. Each stack contains thread-dependent clean-up code at the bottom (or top, for top-down stack).
As an FYI, I have not tried the following, but it's possible that you might be able to get something to work like this with a naked function (AFAIK only Microsoft compilers):
https://msdn.microsoft.com/en-us/library/5ekezyy2.aspx
There are a significant number of limitations: https://msdn.microsoft.com/en-us/library/4d12973a.aspx but starting a thread with a naked function isn't listed as a limitation. A naked function would remove the prolog/epilog and allow you to try and transfer the context from the previous thread.
You can potentially also do this through an interpreter: basically save the interpreted state of the program and start on a separate thread.
As I can think of no actual use case, I'm not sure why you would ever want to do this.

Read eax register

I would like to know if it is possible to read the eax register of another process immediately after an assembly instruction has been executed.
In my case I have the following assembly code:
mov byte ptr ss:[EBP-4]
call dword ptr ds:[<&MSVCR100.??2#YAPAXI#Z>]
add esp, 4
The idea is to get the eax value just after the "call dword ptr ds:[<&MSVCR100.??2#YAPAXI#Z>]" instruction has been executed.
Indeed, I have to retrieve the memory address returned by the instanciation of an object created in another process, in my C++ code.
Dunno if I have been clear enough. And please forgive my bad english.
You could debug the process using a hardware breakpoint.
Example using winapi:
DWORD address = 0x12345678; // address of the instruction after the call
DebugActiveProcess(pid); // PID of target process
CONTEXT ctx = {0};
ctx.ContextFlags = CONTEXT_DEBUG_REGISTERS | CONTEXT_INTEGER;
ctx.Dr0 = address;
ctx.Dr7 = 0x00000001;
SetThreadContext(hThread, &ctx); // hThread with enough permissions
DEBUG_EVENT dbgEvent;
while (true)
{
if (WaitForDebugEvent(&dbgEvent, INFINITE) == 0)
break;
if (dbgEvent.dwDebugEventCode == EXCEPTION_DEBUG_EVENT &&
dbgEvent.u.Exception.ExceptionRecord.ExceptionCode == EXCEPTION_SINGLE_STEP)
{
if (dbgEvent.u.Exception.ExceptionRecord.ExceptionAddress == (LPVOID)address)
{
GetThreadContext(hThread, &ctx);
DWORD eax = ctx.Eax; // eax get
}
}
ContinueDebugEvent(dbgEvent.dwProcessId, dbgEvent.dwThreadId, DBG_CONTINUE);
}

Why does GetLastError() return 0 or 2 depending on how it is called?

I am using mingw g++ 4.6.1 with -O0, WinXP SP2.
Minimal working example is here.
g++ is configured with --disable-sjlj-exceptions --with-dwarf2.
GetLastError() returns 0 or 2 depeding on how the exception is thrown:
throw runtime_error(error_message());
bogus "error code: 0" is printed, and
const string msg = error_message();
throw runtime_error(msg);
prints "error code: 2" as expected.
First, I thought GetLastError() is invoked twice but debugging shows it is invoked exactly once, as expected.
What is going on?
It's possible that the code that sets up a throw calls a Win32 API function inside itself somewhere, that resets the Last-Error value to 0. This may be happening before your call to error_message().
Calling GetLastError() does not automatically reset the Last-Error value to 0, so it is safe to call twice.
Whether your compiler/runtime generates code that calls a Win32 API function will be up to your specific runtime. In order to be safe and not depend on this, use the two-statement version:
const string msg = error_message();
throw runtime_error(msg);
Better yet, for future readers of your code it would be useful to call GetLastError() outside error_message():
const string msg = error_message(GetLastError());
throw runtime_error(msg);
This way, readers will see the GetLastError() call immediately after the corresponding Win32 API call, where it belongs.
If you look at the assembly code generated, it become clear what's happening. The following C++ code:
hDevice = CreateFileA(path, // drive to open
// etc...
);
if (hDevice == INVALID_HANDLE_VALUE) // cannot open the drive
{
throw runtime_error(error_message());
}
Generates a stretch of assembly code (at least using default optimization):
call _CreateFileA#28 #
LEHE4:
sub esp, 28 #,
mov DWORD PTR [ebp-12], eax # hDevice, D.51673
cmp DWORD PTR [ebp-12], -1 # hDevice,
jne L5 #,
mov DWORD PTR [esp], 8 #,
call ___cxa_allocate_exception # // <--- this call is made between the
# // CreateFile() call and the
# // error_message() call
mov ebx, eax # D.50764,
lea eax, [ebp-16] # tmp66,
mov DWORD PTR [esp], eax #, tmp66
LEHB5:
call __Z13error_messagev #
You see a call made to ___cxa_allocate_exception to allocate some memory block for the exception being thrown. That function call is changing the GetLastError() state.
When the C++ code looks like:
hDevice = CreateFileA(path, // drive to open
// etc...
);
if (hDevice == INVALID_HANDLE_VALUE) // cannot open the drive
{
const string msg = error_message();
throw runtime_error(msg);
}
Then you get the following generated assembly:
call _CreateFileA#28 #
sub esp, 28 #,
mov DWORD PTR [ebp-12], eax # hDevice, D.51674
cmp DWORD PTR [ebp-12], -1 # hDevice,
jne L5 #,
lea eax, [ebp-16] # tmp66,
mov DWORD PTR [esp], eax #, tmp66
call __Z13error_messagev #
LEHE4:
sub esp, 4 #,
mov DWORD PTR [esp], 8 #,
call ___cxa_allocate_exception # // <--- now this happens *after*
// error_message() has been called
which does not call an external function between the failed CreateFile() call and the call to error_message().
This kind of problem is one of the main problems with error handling using some global state like GetLastError() or errno.

PeekMessage() throws an unhandled exception (access violation)

Greetings all,
in my application i use the following code:
bool HandleMessages()
{
MSG msg;
if (PeekMessage(&msg, NULL, 0, 0, PM_REMOVE))
{
if (msg.message == WM_QUIT)
return FALSE;
TranslateMessage(&msg);
DispatchMessage(&msg);
}
return true;
}
This is the standard code for message handling in windows i thought, but now when i try to run the program, i always get an Exception at the PeekMessage() call.
Exception message is
Unhandled exception at 0x57a10eed
(msvcr100d.dll) in testing.exe:
0xC0000005: access violation while
reading at Position 0x6666665c.
Im completely lost here, cant see why it would throw an exception. Anyone got a hint?
Call Stack:
msvcr100d.dll!__local_unwind2() + 0x48 Bytes Asm
msvcr100d.dll!_except_handler3() + 0xed Bytes Asm
Testing.exe!_except_handler4(_EXCEPTION_RECORD * ExceptionRecord, _EXCEPTION_REGISTRATION_RECORD * EstablisherFrame, _CONTEXT * ContextRecord, void * DispatcherContext) + 0x24 Bytes C
Testing.exe!_except_handler4(_EXCEPTION_RECORD * ExceptionRecord, _EXCEPTION_REGISTRATION_RECORD * EstablisherFrame, _CONTEXT * ContextRecord, void * DispatcherContext) + 0x24 Bytes C
Disassembly:
continue:
57CE0EEA lea esi,[esi+esi*2]
57CE0EED mov ecx,dword ptr [ebx+esi*4]
57CE0EF0 mov dword ptr [esp+0Ch],ecx
57CE0EF4 mov dword ptr [eax+0Ch],ecx
57CE0EF7 cmp dword ptr [ebx+esi*4+4],0
57CE0EFC jne _lu_continue (57CE0F15h)
57CE0EFE push 101h
57CE0F03 mov eax,dword ptr [ebx+esi*4+8]
57CE0F07 call _NLG_Notify (57CE0F55h)
57CE0F0C mov eax,dword ptr [ebx+esi*4+8]
57CE0F10 call _NLG_Call (57CE0F74h)
Show us your call stack. If it's crashing in msvcr100d.dll, then it's happening outside of PeekMessage (before or after the call). you should have good debugging info for this.
Take a look at the this pointer if applicable
do a rebuild all
Step into the disassembly
I don't think the callstack you posted is quite sufficient to make anything out from it.
Is there a chance you could be calling HandleMessages() in response to a message? This could result in recursion / stack exhaustion.