Specify the memory start address for a process - c++

I wish to know if it is possible to load the process at a user (pre)specified address?
Thanks,
Ashutosh

The base address is specified in the PE file. If you mean for an EXE that you're compiling in MSVC, then you can set the base address in the linker settings. If you've got an arbitrary EXE or DLL, you could alter the base address by hand, with a good PE resource. You should also turn off ASLR - it's also a project setting and in the PE file.
Most EXE files load at their preferred base address as when you start a process with one, it's the only thing in the address space, and it's not unheard of for exe files to skip the relocation table. DLLs however sometimes have to be re-based. It's not a good idea at all to depend on loading at a specific base address.

Related

How does GDB know where an executable has been relocated?

I know modern OSs such as Linux don't always execute an application at the same address it was originally linked. When a debugger starts looking around though, it needs to know the relationship between the original link address and the final executing address. How does GDB calculate the offset?
Clarifications: I'm not talking about virtual memory. That is, I have (what I believe to be) a reasonable understanding of how Virtual memory works and am operating entirely with in that address space. I have symbols that are at one location when I dump the symbol table from the ELF, but at another location when I get their address out of memory.
In this particular case, I have a string which in the linked executable is at address 0x0E984141. In a dump of memory from that process, it is at address 0x0E3F2781. Everything in the .rodata section at least has been shifted by 0x5919C0. It appears to be something like Address Space Layout Randomization.
I know modern OSs such as Linux don't always execute an application at the same address it was originally linked.
This is only possible for position-independent executables (linked with -pie flag).
When a debugger starts looking around though, it needs to know the relationship between the original link address and the final executing address.
Correct.
How does GDB calculate the offset?
The same way GDB calculates the offset for shared libraries (a PIE executable is really a special case of a shared library). There is a defined interface between ld.so and GDB, consisting of _dl_debug_state() function (on which GDB sets an internal breakpoint, and which ld.so calls whenever it maps a new ELF image into the process), and struct r_debug. The latter points to a linked list of struct link_maps, and l_addr member of that struct is the offset between linked-at and loaded-at address.
If i understand what you are getting at, I think what you are actually referring to is Virtual Memory addressing This is not handled by GDB, it is handled by the operating system.
http://www.cs.utexas.edu/users/witchel/372/lectures/15.VirtualMemory.pdf
On Linux, every process has its own address space in virtual memory.
The ELF executable contains a header describing the segments in memory (and their corresponding sections in the executable).

Base dll address always different and hash mismatch

I made a program that reads X bytes from a loaded dll (module) of a process and hashes them to compare them to a clean hash which is hardcoded. The base address of the module is always the same (tested at a few different computers by different people on XP and 7) and the hash is also always the same.
But for one person the base address is always different and the hash is also always different (different on every run). He is using Windows 7 Ultimate.
My questions are:
Why would the base address of the module always be different? I know that dlls can be loaded at different addresses but what triggers this behavior? (Is DLL always have the same Base Address?) The base address is always of type 0x02XXXXXX while the never-changing address everyone else gets is 0x6F000000.
Why does the hash mismatch? Even if the module is loaded at different address, I still read the same number of bytes from base+someoffset. Not only is the hash different, it is different everytime you run the program. Because of this I suspect that the base address is actually wrong and something fishy is going on. I compared the md5 of my dll and his dll and they are the same so the library being loaded is defenetly identical.
The steps taken in the code:
Get a process handle (CreateToolhelp32Snapshot, Process32Next)
Enumerate the loaded modules (EnumProcessModules)
Find the specific module by name (GetModuleFileNameEx) and get the handle
Add an additional offset to the module base address (offset within a module)
Read X bytes from the module with ReadProcessMemory(hProcess, base_of_module+some_additional_offset, dllBuffer_to_read_into, 0x100000, &numRead) where 0x100000 does not overflow the module size
What this program is doing is comparing the in-memory dll content against a "clean" hash to discover tampering from malware/hacks/etc.
Your approach cannot hope to succeed. The base address of a DLL is only a guide to the loader. The loader may choose to load the DLL at that address. And if it does so it does not need to fix up any absolute references.
However, if the requested address is not available (something else in the process has already reserved the requested address range) or the loader chooses not to use the requested address (ASLR for instance), then the DLL will be loaded at some other address. And then the relocation table will be used to modify absolute references.
In order for your hash calculation to be robust, you would need to account for the relocations. You could, in principle, read the relocation table, and account for the relocations when performing the hash calculation. However, that is likely to be very tricky to get right.
Some other DLL, configured to load in every process, is loaded on that system. This can e.g. happen if you install webcam or mouse software, these tend to force their DLLs to be loaded in every process. Of course, if this DLL is loaded at an address preventing your preferred base address from being used, your DLL will be relocated.
Relocations. When a DLL is loaded, the .reloc section is parsed by the loader, and corrections to absolute addresses are written directly to the loaded DLL image. In order to create a correct hash, you must also read the relocation directory and correct for these loader modifications of the DLL.
The most likely cause is EMET, the Enhanced Mitigation Experience Toolkit from Microsoft.
One of the things EMET does is to enforce ASLR (Address Space Layout Randomization), i.e., it forces all DLLs to be loaded at random addresses, even if they aren't configured to use ASLR. This make it considerably more difficult for an attacker to exploit vulnerabilities.

Is it possible to change the entry point of a process from a DLL?

The default entry point for most application processes is usually 0x401000.
Is there any way we could shift or change the entry point of a process? For example, if I wanted to change the entry point to 0x901000 externally using a DLL (assuming that the process loaded the DLL via C++)?
I'm trying to create a DLL to edit the process's default entry point.
Yes, you can change ImageBase in Optional Header of Portable Executable, if your linker allows this.
Most linkers set ImageBase=0x10000 when linking executable and 0x400000 when linking DLL. However, this number is chosen arbitrarily (I guess because it is easy to remember and looks good in debuggers) and it may be disobeyed by the loader if the memory is already occupied.
See http://msdn.microsoft.com/en-us/library/ms809762.aspx
Table 3. paragraph IMAGE_OPTIONAL_HEADER.ImageBase:
When the linker creates an executable, it assumes that the file will be memory-mapped to a specific location in memory. That address is stored in this field, assuming a load address allows linker optimizations to take place. If the file really is memory-mapped to that address by the loader, the code doesn't need any patching before it can be run. In executables produced for Windows NT, the default image base is 0x10000. For DLLs, the default is 0x400000. In Windows 95, the address 0x10000 can't be used to load 32-bit EXEs because it lies within a linear address region shared by all processes. Because of this, Microsoft has changed the default base address for Win32 executables to 0x400000. Older programs that were linked assuming a base address of 0x10000 will take longer to load under Windows 95 because the loader needs to apply the base relocations.
On Windows, the default load address for EXEs is 0x400000 - so that's where that part of 0x401000 comes from.
The 0x1000 component is the offset into the image in memory where (usually) the text segment that hold the bulk of the code starts. That's where this particular program's entry point is.
That offset is a field in the PE header, as is indeed the default load address of 0x400000. Both can be changed, but be aware that for EXEs, relocation information is often stripped: Since the default load address is always guaranteed to be free when a new process is first created, relocation information is often assumed to not be needed for EXEs.
If that is the case for your EXE then you can't change the load address without doing major surgery to the image to manually identify and fix up any references that are relative to the assumed 0x400000 load address used during compilation/linking.

Understanding Dynamic Library loading in Linux

I am trying to understand Dynamic Library loading in Linux from here [1] and want to clarify the concept. Concretely, when a dynamic library is loaded in a process in a Linux environment, it is loaded at any point in the address space. Now, a library has a code segment, and a data segment. The code segment's address is not defined pre-linking so it is 0x0000000 while for data segment, some number is defined to be an address.
But here is the trick, this address of data segment is not actually the true address. Actually, at whatever position code segment is loaded, data segment's pre-defined address is added to it.
Am I correct here?
One more thing from the referenced article. What does this statement mean?
However, we have the constraint that the shared library must still have a unqiue data instance in each process. While it would be possible to put the library data anywhere we want at runtime, this would require leaving behind relocations to patch the code and inform it where to actually find the data — destroying the always read-only property of the code and thus sharability.
[1] http://www.technovelty.org/linux/plt-and-got-the-key-to-code-sharing-and-dynamic-libraries.html
Actually, at whatever position code segment is loaded, data segment's pre-defined address is added to it.
Yes. The "VirtAddr" of the data segment will be added to base address.
What does this statement mean?
It means that when library accesses its own static data, we should not use relocations in the library code. Otherwise linker may need to patch the binary code, which leads to unsharing some parts of library codes between processes (if process1 loads library lib1 at 0x40000000, and process2 loads lib1 at 0x50000000, their data relocations will be different).
So, different solution is used in real life. Both library code and data are loaded together, and the offset between code and data is fixed for all cases. There is the "solution" after text you cited: http://www.technovelty.org/linux/plt-and-got-the-key-to-code-sharing-and-dynamic-libraries.html
As you can see from the above headers, the solution is that the read-write data section is always put at a known offset from the code section of the library. This way, via the magic of virtual-memory, every process sees its own data section but can share the unmodified code. All that is needed to access data is some simple maths; address of thing I want = my current address + known fixed offset.

global variable defined in a dll and global variable of the host software

If I have a global variable defined in a DLL that my application load, is this variable is located at the same memory region that my others global variable defined in my application (so not directly in the DLL) ?
Global data loaded as part of the EXE and global data loaded as part of the DLL both reside in the virtual memory space of the same process, though in different areas corresponding to the segments defined in those EXE and DLL files. Since they are in the same virtual memory space, code in the DLL can use a pointer to an EXE global that the EXE passes to it, and vice-versa.
The answer is yes.
MSDN quote:
"Every process that loads the DLL maps it into its virtual address space".
Go to this link and you'll find the answer to your doubt.
Good luck
Your tag indicates C++ but the answer may also be platform/OS dependent. Under windows each process will make a copy of the data. Here's a snippet from the MSDN Run Time Behavior article:
Each time a new process attempts to
use the DLL, the operating system
creates a separate copy of the DLL's
data: this is called process attach.
In a single process global data is well,... global.