Identifying data segment in Win32/Win64 - c++

I am a DLL loaded in the memory space of some process. I am part of a number of DLLs that are present in this process, some loaded dynamically and some statically.
There is a "data gem" left for me to discover somewhere in this process's space and we will assume it is in a "data" segment (ie not in some weird self modifying code).
I need to find it. I need to search memory, eg do a memcmp() but I do not know where to start looking. Maybe I can brute force search from 0 to many-gigs but that will throw read-access or execute-only exceptions and maybe I will be able to handle these exceptions so that I do not bring the whole process down. But it sounds dodgy.
Is there a more intelligent way to search ? Off the top of my head, I could look into the data segments of the main process because there is a way to get the address ranges from the NT header somehow, and I do know the process which I have got loaded in. Then I could enumerate all loaded DLLs and look inside their spaces too.
Can anyone please suggest a method or even tell me if I am on the right track?

You can enumerate all the loaded modules in you process via EnumProcessModules using GetCurrentProcess as the process handle. Then for each module you can call GetModuleInformation which will return you a MODULEINFO struct which tells you exactly where in memory the module is loaded and its size. Alternatively you can call GetModuleFileNameEx
and examine the module on disk.
Do note that reading arbitrary memory in a process - even the one you're currently running in - can have issues. For example if another thread is running at the same time as yours then it can affect the module table as you're iterating over it.

After some testing a Win32 process may use memory that it has acquired via a number of methods, I think it all ends up using VirtualAlloc and a bit higher level with HeapCreate et al. In the end the data gem may be in a module's "data" segments, or on a heap, even on a stack - both allocated with VirtualAlloc. There may well be other memory allocation methods.
When we look at a Windows process it will have a bunch of DLLs loaded many of which will be using their own "heap" and/or direct VirtualAlloc calls. Others will be sharing the main process's heap.
I have enumerated the process's heaps using GetProcessHeaps and then HeapWalk concentrating only on PROCESS_HEAP_ENTRY_BUSY and I have, luckily, found what I was looking for. My "heapwalk" is by no means an exhaustive search.
I have not found a way, and it is academic for me now, to link a heap entry (block) to a particular module. Similarly if I were to look into all the VirtualAllocs I would not know how to trace the allocated blocks back to some code running inside a module. But that step is academic.

Related

Discarding DLL's resource after using it

I'm creating a dll with an embedded binary resource. Currently when I load this DLL it gets memory mapped into my process address space. The problem is that the embedded binary resource is huge and I don't want to keep it around once I'm done using it.
I tried looking up documentation regarding this and apparently there are sections in the PE file which don't get memory mapped(relocation section). Also, I can create new sections and flag it IMAGE_SCN_MEM_DISCARDABLE but this flag is ignored outside of kernel mode.
There was a win API function which supported freeing resources for 16-bit Windows but doesn't work 32-bit onward. The documentation says "This function is obsolete and is only supported for backward compatibility with 16-bit Windows. For 32-bit Windows applications, it is not necessary to free the resources loaded using LoadResource. If used on 32 or 64-bit Windows systems, this function will return FALSE". I don't know what they mean by that but it seems like they don't expect resources to be huge and can be accommodated in the address space.
Is there any way for me to keep discard the resources I load after I'm done using them?
The system will discard them if it needs to. So long as you are not referring to the memory, it can be discarded and paged out if the system needs the physical memory for something else. So, it won't stop physical memory being used for that which needs it.
That said, linked resources are not intended to be huge. The point is that a module is mapped into a contiguous range of memory. If your module is really huge then it may be impossible to find such a contiguous range of memory. What's more, the module's address range is reserved for the entire lifetime of the resource. That means that nothing else in the process can use that virtual memory address range. So even if a contiguous address range can be found, it is forever reserved for the module and that address range cannot be used for anything else. And this can easily become a problem for 32 bit applications.
So, by putting the huge resource in memory you won't incur a long-standing drain on physical resources, but you will put an unavoidable constraint on virtual memory address space resources.
The conclusion to draw is that such huge objects should be held in external files and not linked to the module as a resource. If you absolutely must use a resource in a PE module, then put the resource into a separate DLL. Load the DLL with LoadLibrary, pull out the resource using the module handle you got from LoadLibrary, and then unload the DLL with FreeLibrary.

How can you track memory across DLL boundaries

I want performant run-time memory metrics so I wrote a memory tracker based on overloading new & delete. It basically lets walk your allocations in the heap and analyze everything about them - fragmentation, size, time, number, callstack, etc. But, it has 2 fatal flaws: It can't track memory allocated in other DLLs and when ownership of objects is passed to DLLs or vice versa crashes ensue. And some smaller flaws: If a user uses malloc instead of new it's untracked; or if a user makes a class defined new/delete.
How can I eliminate these flaws? I think I must be going about this fundamentally incorrectly by overloading new/delete, is there a better way?
The right way to implement this is to use detours and a separate tool that runs in its own process. The procedure is roughly the following:
Create memory allocation in a remote process.
Place there code of a small loader that will load your dll.
Call CreateRemoteThread API that will run your loader.
From inside of the loaded dll establish detours (hooks, interceptors) on the alloc/dealloc functions.
Process the calls, track activity.
If you implement your tool this way, it will be not important from what DLL or directly from exe the memory allocation routines are called. Plus you can track activities from any process, not necessarily that you compiled yourself.
MS Windows allows checking contents of the virtual address space of the remote process. You can summarize use of virtual address space that was collected this way in a histogram, like the following:
From this picture you can see how many virtual allocation of what size are existing in your target process.
The picture above shows an overview of the virtual address space usage in 32-bit MSVC DevEnv. Blue stripe means a commited piece of emory, magenta stripe - reserved. Green is unoccupied part of the address space.
You can see that lower addresses are pretty fragmented, while the middle area - not. Blue lines at high addresses - various dlls that are loaded into the process.
You should find out the common memory management routines that are called by new/delete and malloc/free, and intercept those. It is usually malloc/free in the end, but check to make sure.
On UNIX, I would use LD_PRELOAD with some library that re-implemented those routines. On Windows, you have to hack a little bit, but this link seems to give a good description of the process. It basically suggests that you use Detours from Microsoft Research.
Passing ownership of objects between modules is fundamentally flawed. It showed up with your custom allocator, but there are plenty of other cases that will fail also:
compiler upgrades, and recompiling only some DLLs
mixing compilers from different vendors
statically linking the runtime library
Just to name a few. Free every object from the same module that allocated it (often by exporting a deletion function, such as IUnknown::Release()).

Using VirtualQuery to find out which "file" uses certain page in memory

I'm using VirtualQuery to go through virtual space of my application. But I'd like to identify everything allocated by application, not just my exe - something like SysInternals' VMmap application - And I need to know which pages belong to which file (I need to identify pages allocated for my application and dlls). How to achieve this?
You can use CreateToolhelp32Snapshot with TH32CS_SNAPMODULE to retrieve modules base addresses and sizes. For heap, you get use GetProcessHeaps() and HeapWalk() to get different heap regions (both committed and reserved).
Other things (thread stacks, mapped memory) seem harder to retrieve.

How can I scan another process memory to find what follows a specific string?

I want to scan the entire heap of a currently running native application through another process.
For example, I want to know what follows all the instances of the ASCII sequence "test" in this process memory (in this case I would scan for "test" and keep reading after it).
I tried to google for more information but didn't find much: I found ReadProcessMemory which looked interesting, but how can I know the memory addresses a process has allocated?
Try VirtualQueryEx.
If you're finding that you're accessing a lot of memory in the other process, consider using CreateRemoveThread (sample code). This will allow you to inject your own DLL into the other process and run code there directly. Once you're running code in the other process, you'll be able to access memory as normal, without needing to use ReadProcessMemory. (You'll still need VirtualQuery to determine the process's memory layout.)

How to create binary/hex dump of another process's memory?

I am having trouble finding a reasonable way to dump another process's memory to a file.
After extensive searching, I've been able to find a nice article at CodeProject that has *most* of the functionality I want:
Performing a hex dump of another process's memory. This does a good job of addressing permission issues and sets a good foundation.
However, with this utility I've seen that even a small process, such as an clean Notepad.exe or Calc.exe instance, can generate a dump file over 24MB in size, while the process itself runs under 20KB in memory according to TaskManager.
The article has lead me to believe that perhaps it is also dumping things in shared memory, possibly DLL space and the like. For example, a dump of Calc.exe will include sections that include method names (and presumably memory) from Kernel32.dll:
²³´µKERNEL32.dll ActivateActCtx AddAtomA AddAtomW AddConsoleAliasA AddConsoleAliasW AddLocalAlternateComputerNameA AddLocalAlternateComputerNameW AddRefActCtx AddVectoredExceptionHandler AllocConsole AllocateUserPhysicalPages AreFileApisANSI AssignProcessToJobObject AttachConsole BackupRead BackupSeek BackupWrite BaseCheckAppcompatCache BaseCleanupAppcompatCache
Is there a better way to dump the memory of another process that doesn't lead to this overhead, or perhaps an improvement upon the linked article's code that solves this problem? I want to get the memory that actually belongs to the process itself. I'd be okay with dumping the memory space of functions that are actually used in DLLs, but it seems unnecessary to dump the *entire* contents of multiple DLLs to get the running memory of the process.
I'm looking for a way to get the 30-60KB of a 30KB process, rather than 25MB for a 30KB process. Or at least closer than I can get currently.
Thanks in advance for your suggestions and guidance, it is appreciated.
Note: This is for a console utility, so GUI elements like the ones in the CodeProject article are unimportant.
You're basically asking for a user process minidump. The Windows Debug Helper library has a ready made function for this, MiniDumpWriteDump.
There is a coarse control over the amount of the detail contained in the mini dump from the MINIDUMP_TYPE parameter passed in to the function. The most basic, MiniDumpNormal, will only capture the call stack of each thread in the process. The amount of memory gets progressively more detailed with the other mini dump types.
You can also fine control the amount of information to be written into the mini dump by providing a callback to the MiniDumpWriteDump function and in the callback set the flags on the MINIDUMP_CALLBACK_OUTPUT structure.
The resulted mini dumps can be read with a debugger like Windbg or Visual Studio, or they can be processed by the various functions in the dbghelp.dll library.
Not really a "how to program it" answer, but I just found your question while looking for a tool that could do that, when I ran into PMDump:
http://ntsecurity.nu/toolbox/pmdump/
It's dead easy and simple to use, and creates correct dumps (I just tried it with some programs).