Base dll address always different and hash mismatch - c++

I made a program that reads X bytes from a loaded dll (module) of a process and hashes them to compare them to a clean hash which is hardcoded. The base address of the module is always the same (tested at a few different computers by different people on XP and 7) and the hash is also always the same.
But for one person the base address is always different and the hash is also always different (different on every run). He is using Windows 7 Ultimate.
My questions are:
Why would the base address of the module always be different? I know that dlls can be loaded at different addresses but what triggers this behavior? (Is DLL always have the same Base Address?) The base address is always of type 0x02XXXXXX while the never-changing address everyone else gets is 0x6F000000.
Why does the hash mismatch? Even if the module is loaded at different address, I still read the same number of bytes from base+someoffset. Not only is the hash different, it is different everytime you run the program. Because of this I suspect that the base address is actually wrong and something fishy is going on. I compared the md5 of my dll and his dll and they are the same so the library being loaded is defenetly identical.
The steps taken in the code:
Get a process handle (CreateToolhelp32Snapshot, Process32Next)
Enumerate the loaded modules (EnumProcessModules)
Find the specific module by name (GetModuleFileNameEx) and get the handle
Add an additional offset to the module base address (offset within a module)
Read X bytes from the module with ReadProcessMemory(hProcess, base_of_module+some_additional_offset, dllBuffer_to_read_into, 0x100000, &numRead) where 0x100000 does not overflow the module size
What this program is doing is comparing the in-memory dll content against a "clean" hash to discover tampering from malware/hacks/etc.

Your approach cannot hope to succeed. The base address of a DLL is only a guide to the loader. The loader may choose to load the DLL at that address. And if it does so it does not need to fix up any absolute references.
However, if the requested address is not available (something else in the process has already reserved the requested address range) or the loader chooses not to use the requested address (ASLR for instance), then the DLL will be loaded at some other address. And then the relocation table will be used to modify absolute references.
In order for your hash calculation to be robust, you would need to account for the relocations. You could, in principle, read the relocation table, and account for the relocations when performing the hash calculation. However, that is likely to be very tricky to get right.

Some other DLL, configured to load in every process, is loaded on that system. This can e.g. happen if you install webcam or mouse software, these tend to force their DLLs to be loaded in every process. Of course, if this DLL is loaded at an address preventing your preferred base address from being used, your DLL will be relocated.
Relocations. When a DLL is loaded, the .reloc section is parsed by the loader, and corrections to absolute addresses are written directly to the loaded DLL image. In order to create a correct hash, you must also read the relocation directory and correct for these loader modifications of the DLL.

The most likely cause is EMET, the Enhanced Mitigation Experience Toolkit from Microsoft.
One of the things EMET does is to enforce ASLR (Address Space Layout Randomization), i.e., it forces all DLLs to be loaded at random addresses, even if they aren't configured to use ASLR. This make it considerably more difficult for an attacker to exploit vulnerabilities.

Related

Call function from executable

I want to call a function from an executable. The only way to reach that process is to inject a dll in the parent process. I can inject a dll in the parent process but how do I call a function from the child process?
Something like
_asm
{
call/jmp address
}
doesnt work. I hope you understand what I mean.
If you are running inside the process, you need to know the offset of the function you want to call from the base of the module (the exe) which contains the function. Then, you just need to make a function pointer and call it.
// assuming the function you're calling returns void and takes 0 params
typedef void(__stdcall * voidf_t)();
// make sure func_offset is the offset of the function when the module is loaded
voidf_t func = (voidf_t) (((uint8_t *)GetModuleHandle('module_name')) + func_offset);
func(); // the function you located is called here
The solution you have will work on 32bit systems (inline assembly is not permitted in 64 bit) if you know the address of the function, but you'll need to make sure you implement the calling convention properly. The code above uses GetModuleHandle to resolve the currently loaded base of the module whose function you want to call.
Once you've injected your module into the running process ASLR isn't really an issue, since you can just ask windows for the base of the module containing the code you wish to call. If you want to find the base of the exe running the current process, you can call GetModuleHandle with a parameter of NULL. If you are confident that the function offset is not going to change, you can hard code the offset of the function you wish to call, after you've found the offset in a disassembler or other tool. Assuming the exe containing the function isn't altered, that offset will be constant.
As mentioned in the comments, the calling convention is important in the function typedef, make sure it matches the calling convention of the function you're calling.
Execution Fundamentals
To call a function you need an address or a interrupt number. The address is loaded into the Program Counter register and execution is transferred. Some processors allow for "Software Interrupts", in which the program executes a special instruction that invokes the software interrupt. This is the foundation for executing functions.
More Background -- Relative Addresses
There are two common forms of executables: Absolute Addressing and Relative (or Position Independ Code,PIC). In absolute addressing, the functions are at hard-coded addresses. The functions won't move. Usually used in embedded systems.
In the relative addressing model, the addresses are relative to the value in the Program Counter register. For example, your function may be 1024 bytes away, so the compiler would emit a relative branch instruction for 1024 bytes (away).
Operating Systems and Moving Targets
Many operating systems load programs in different places for each invocation. This means your executable may start at address 1000, and the next time at address 127654. In these operating systems, there is no guarantee that an executable will be launched at the same location each time.
Executing within your program
Executing functions within your program is easy. The linker decides where all the functions will be located and determines how to execute them; whether to use absolute addressing, PIC or a mixture.
Executing Functions in another Executable
With the above knowledge, there are issues with executing functions in another program:
Location of the Function in the external executable
Determining if the executable is active
Calling protocol for the executable
Most executables do not contain any information about where their functions are, so you will need to know where it is. You will also need to know if the function is absolute addressing or PIC. You will also need to know if the function is in memory when you need it or if the OS has paged the function to the hard drive.
Knowing the function location is necessary. However, the location is of no use if the OS has not loaded the executable. Before you call a function in another executable, you will need to know if it is present in memory when the call is executed.
Lastly, you will need to know the protocol used for the external function. For example, are the values passed by register? Are they on the stack? Are they passed by pointer (address)?
A Solution: Shared Libraries
Operating systems (OS) have evolved to allow for dynamically sharing of functions. These functions exist in Dynamically Linked Libraries (DLL) or Shared Library(.SO). Your program tells the OS to load the library into memory, then you tell the OS to execute the function by giving it the name of the function.
The caveat is that the function you desire must be in a library. If the executable doesn't use a shared library or the function you need is not in a library, then your mission is more difficult.

How does GDB know where an executable has been relocated?

I know modern OSs such as Linux don't always execute an application at the same address it was originally linked. When a debugger starts looking around though, it needs to know the relationship between the original link address and the final executing address. How does GDB calculate the offset?
Clarifications: I'm not talking about virtual memory. That is, I have (what I believe to be) a reasonable understanding of how Virtual memory works and am operating entirely with in that address space. I have symbols that are at one location when I dump the symbol table from the ELF, but at another location when I get their address out of memory.
In this particular case, I have a string which in the linked executable is at address 0x0E984141. In a dump of memory from that process, it is at address 0x0E3F2781. Everything in the .rodata section at least has been shifted by 0x5919C0. It appears to be something like Address Space Layout Randomization.
I know modern OSs such as Linux don't always execute an application at the same address it was originally linked.
This is only possible for position-independent executables (linked with -pie flag).
When a debugger starts looking around though, it needs to know the relationship between the original link address and the final executing address.
Correct.
How does GDB calculate the offset?
The same way GDB calculates the offset for shared libraries (a PIE executable is really a special case of a shared library). There is a defined interface between ld.so and GDB, consisting of _dl_debug_state() function (on which GDB sets an internal breakpoint, and which ld.so calls whenever it maps a new ELF image into the process), and struct r_debug. The latter points to a linked list of struct link_maps, and l_addr member of that struct is the offset between linked-at and loaded-at address.
If i understand what you are getting at, I think what you are actually referring to is Virtual Memory addressing This is not handled by GDB, it is handled by the operating system.
http://www.cs.utexas.edu/users/witchel/372/lectures/15.VirtualMemory.pdf
On Linux, every process has its own address space in virtual memory.
The ELF executable contains a header describing the segments in memory (and their corresponding sections in the executable).

Is it possible to change the entry point of a process from a DLL?

The default entry point for most application processes is usually 0x401000.
Is there any way we could shift or change the entry point of a process? For example, if I wanted to change the entry point to 0x901000 externally using a DLL (assuming that the process loaded the DLL via C++)?
I'm trying to create a DLL to edit the process's default entry point.
Yes, you can change ImageBase in Optional Header of Portable Executable, if your linker allows this.
Most linkers set ImageBase=0x10000 when linking executable and 0x400000 when linking DLL. However, this number is chosen arbitrarily (I guess because it is easy to remember and looks good in debuggers) and it may be disobeyed by the loader if the memory is already occupied.
See http://msdn.microsoft.com/en-us/library/ms809762.aspx
Table 3. paragraph IMAGE_OPTIONAL_HEADER.ImageBase:
When the linker creates an executable, it assumes that the file will be memory-mapped to a specific location in memory. That address is stored in this field, assuming a load address allows linker optimizations to take place. If the file really is memory-mapped to that address by the loader, the code doesn't need any patching before it can be run. In executables produced for Windows NT, the default image base is 0x10000. For DLLs, the default is 0x400000. In Windows 95, the address 0x10000 can't be used to load 32-bit EXEs because it lies within a linear address region shared by all processes. Because of this, Microsoft has changed the default base address for Win32 executables to 0x400000. Older programs that were linked assuming a base address of 0x10000 will take longer to load under Windows 95 because the loader needs to apply the base relocations.
On Windows, the default load address for EXEs is 0x400000 - so that's where that part of 0x401000 comes from.
The 0x1000 component is the offset into the image in memory where (usually) the text segment that hold the bulk of the code starts. That's where this particular program's entry point is.
That offset is a field in the PE header, as is indeed the default load address of 0x400000. Both can be changed, but be aware that for EXEs, relocation information is often stripped: Since the default load address is always guaranteed to be free when a new process is first created, relocation information is often assumed to not be needed for EXEs.
If that is the case for your EXE then you can't change the load address without doing major surgery to the image to manually identify and fix up any references that are relative to the assumed 0x400000 load address used during compilation/linking.

Understanding Dynamic Library loading in Linux

I am trying to understand Dynamic Library loading in Linux from here [1] and want to clarify the concept. Concretely, when a dynamic library is loaded in a process in a Linux environment, it is loaded at any point in the address space. Now, a library has a code segment, and a data segment. The code segment's address is not defined pre-linking so it is 0x0000000 while for data segment, some number is defined to be an address.
But here is the trick, this address of data segment is not actually the true address. Actually, at whatever position code segment is loaded, data segment's pre-defined address is added to it.
Am I correct here?
One more thing from the referenced article. What does this statement mean?
However, we have the constraint that the shared library must still have a unqiue data instance in each process. While it would be possible to put the library data anywhere we want at runtime, this would require leaving behind relocations to patch the code and inform it where to actually find the data — destroying the always read-only property of the code and thus sharability.
[1] http://www.technovelty.org/linux/plt-and-got-the-key-to-code-sharing-and-dynamic-libraries.html
Actually, at whatever position code segment is loaded, data segment's pre-defined address is added to it.
Yes. The "VirtAddr" of the data segment will be added to base address.
What does this statement mean?
It means that when library accesses its own static data, we should not use relocations in the library code. Otherwise linker may need to patch the binary code, which leads to unsharing some parts of library codes between processes (if process1 loads library lib1 at 0x40000000, and process2 loads lib1 at 0x50000000, their data relocations will be different).
So, different solution is used in real life. Both library code and data are loaded together, and the offset between code and data is fixed for all cases. There is the "solution" after text you cited: http://www.technovelty.org/linux/plt-and-got-the-key-to-code-sharing-and-dynamic-libraries.html
As you can see from the above headers, the solution is that the read-write data section is always put at a known offset from the code section of the library. This way, via the magic of virtual-memory, every process sees its own data section but can share the unmodified code. All that is needed to access data is some simple maths; address of thing I want = my current address + known fixed offset.

Specify the memory start address for a process

I wish to know if it is possible to load the process at a user (pre)specified address?
Thanks,
Ashutosh
The base address is specified in the PE file. If you mean for an EXE that you're compiling in MSVC, then you can set the base address in the linker settings. If you've got an arbitrary EXE or DLL, you could alter the base address by hand, with a good PE resource. You should also turn off ASLR - it's also a project setting and in the PE file.
Most EXE files load at their preferred base address as when you start a process with one, it's the only thing in the address space, and it's not unheard of for exe files to skip the relocation table. DLLs however sometimes have to be re-based. It's not a good idea at all to depend on loading at a specific base address.