Understanding Dynamic Library loading in Linux

Understanding Dynamic Library loading in Linux - c++

I am trying to understand Dynamic Library loading in Linux from here [1] and want to clarify the concept. Concretely, when a dynamic library is loaded in a process in a Linux environment, it is loaded at any point in the address space. Now, a library has a code segment, and a data segment. The code segment's address is not defined pre-linking so it is 0x0000000 while for data segment, some number is defined to be an address.
But here is the trick, this address of data segment is not actually the true address. Actually, at whatever position code segment is loaded, data segment's pre-defined address is added to it.
Am I correct here?
One more thing from the referenced article. What does this statement mean?
However, we have the constraint that the shared library must still have a unqiue data instance in each process. While it would be possible to put the library data anywhere we want at runtime, this would require leaving behind relocations to patch the code and inform it where to actually find the data — destroying the always read-only property of the code and thus sharability.
[1] http://www.technovelty.org/linux/plt-and-got-the-key-to-code-sharing-and-dynamic-libraries.html

Actually, at whatever position code segment is loaded, data segment's pre-defined address is added to it.
Yes. The "VirtAddr" of the data segment will be added to base address.
What does this statement mean?
It means that when library accesses its own static data, we should not use relocations in the library code. Otherwise linker may need to patch the binary code, which leads to unsharing some parts of library codes between processes (if process1 loads library lib1 at 0x40000000, and process2 loads lib1 at 0x50000000, their data relocations will be different).
So, different solution is used in real life. Both library code and data are loaded together, and the offset between code and data is fixed for all cases. There is the "solution" after text you cited: http://www.technovelty.org/linux/plt-and-got-the-key-to-code-sharing-and-dynamic-libraries.html
As you can see from the above headers, the solution is that the read-write data section is always put at a known offset from the code section of the library. This way, via the magic of virtual-memory, every process sees its own data section but can share the unmodified code. All that is needed to access data is some simple maths; address of thing I want = my current address + known fixed offset.

Related

How does GDB know where an executable has been relocated?

I know modern OSs such as Linux don't always execute an application at the same address it was originally linked. When a debugger starts looking around though, it needs to know the relationship between the original link address and the final executing address. How does GDB calculate the offset?
Clarifications: I'm not talking about virtual memory. That is, I have (what I believe to be) a reasonable understanding of how Virtual memory works and am operating entirely with in that address space. I have symbols that are at one location when I dump the symbol table from the ELF, but at another location when I get their address out of memory.
In this particular case, I have a string which in the linked executable is at address 0x0E984141. In a dump of memory from that process, it is at address 0x0E3F2781. Everything in the .rodata section at least has been shifted by 0x5919C0. It appears to be something like Address Space Layout Randomization.

I know modern OSs such as Linux don't always execute an application at the same address it was originally linked.
This is only possible for position-independent executables (linked with -pie flag).
When a debugger starts looking around though, it needs to know the relationship between the original link address and the final executing address.
Correct.
How does GDB calculate the offset?
The same way GDB calculates the offset for shared libraries (a PIE executable is really a special case of a shared library). There is a defined interface between ld.so and GDB, consisting of _dl_debug_state() function (on which GDB sets an internal breakpoint, and which ld.so calls whenever it maps a new ELF image into the process), and struct r_debug. The latter points to a linked list of struct link_maps, and l_addr member of that struct is the offset between linked-at and loaded-at address.

If i understand what you are getting at, I think what you are actually referring to is Virtual Memory addressing This is not handled by GDB, it is handled by the operating system.
http://www.cs.utexas.edu/users/witchel/372/lectures/15.VirtualMemory.pdf

On Linux, every process has its own address space in virtual memory.
The ELF executable contains a header describing the segments in memory (and their corresponding sections in the executable).

Base dll address always different and hash mismatch

I made a program that reads X bytes from a loaded dll (module) of a process and hashes them to compare them to a clean hash which is hardcoded. The base address of the module is always the same (tested at a few different computers by different people on XP and 7) and the hash is also always the same.
But for one person the base address is always different and the hash is also always different (different on every run). He is using Windows 7 Ultimate.
My questions are:
Why would the base address of the module always be different? I know that dlls can be loaded at different addresses but what triggers this behavior? (Is DLL always have the same Base Address?) The base address is always of type 0x02XXXXXX while the never-changing address everyone else gets is 0x6F000000.
Why does the hash mismatch? Even if the module is loaded at different address, I still read the same number of bytes from base+someoffset. Not only is the hash different, it is different everytime you run the program. Because of this I suspect that the base address is actually wrong and something fishy is going on. I compared the md5 of my dll and his dll and they are the same so the library being loaded is defenetly identical.
The steps taken in the code:
Get a process handle (CreateToolhelp32Snapshot, Process32Next)
Enumerate the loaded modules (EnumProcessModules)
Find the specific module by name (GetModuleFileNameEx) and get the handle
Add an additional offset to the module base address (offset within a module)
Read X bytes from the module with ReadProcessMemory(hProcess, base_of_module+some_additional_offset, dllBuffer_to_read_into, 0x100000, &numRead) where 0x100000 does not overflow the module size
What this program is doing is comparing the in-memory dll content against a "clean" hash to discover tampering from malware/hacks/etc.

Your approach cannot hope to succeed. The base address of a DLL is only a guide to the loader. The loader may choose to load the DLL at that address. And if it does so it does not need to fix up any absolute references.
However, if the requested address is not available (something else in the process has already reserved the requested address range) or the loader chooses not to use the requested address (ASLR for instance), then the DLL will be loaded at some other address. And then the relocation table will be used to modify absolute references.
In order for your hash calculation to be robust, you would need to account for the relocations. You could, in principle, read the relocation table, and account for the relocations when performing the hash calculation. However, that is likely to be very tricky to get right.

Some other DLL, configured to load in every process, is loaded on that system. This can e.g. happen if you install webcam or mouse software, these tend to force their DLLs to be loaded in every process. Of course, if this DLL is loaded at an address preventing your preferred base address from being used, your DLL will be relocated.
Relocations. When a DLL is loaded, the .reloc section is parsed by the loader, and corrections to absolute addresses are written directly to the loaded DLL image. In order to create a correct hash, you must also read the relocation directory and correct for these loader modifications of the DLL.

The most likely cause is EMET, the Enhanced Mitigation Experience Toolkit from Microsoft.
One of the things EMET does is to enforce ASLR (Address Space Layout Randomization), i.e., it forces all DLLs to be loaded at random addresses, even if they aren't configured to use ASLR. This make it considerably more difficult for an attacker to exploit vulnerabilities.

Is it possible to change the entry point of a process from a DLL?

The default entry point for most application processes is usually 0x401000.
Is there any way we could shift or change the entry point of a process? For example, if I wanted to change the entry point to 0x901000 externally using a DLL (assuming that the process loaded the DLL via C++)?
I'm trying to create a DLL to edit the process's default entry point.

Yes, you can change ImageBase in Optional Header of Portable Executable, if your linker allows this.
Most linkers set ImageBase=0x10000 when linking executable and 0x400000 when linking DLL. However, this number is chosen arbitrarily (I guess because it is easy to remember and looks good in debuggers) and it may be disobeyed by the loader if the memory is already occupied.
See http://msdn.microsoft.com/en-us/library/ms809762.aspx
Table 3. paragraph IMAGE_OPTIONAL_HEADER.ImageBase:
When the linker creates an executable, it assumes that the file will be memory-mapped to a specific location in memory. That address is stored in this field, assuming a load address allows linker optimizations to take place. If the file really is memory-mapped to that address by the loader, the code doesn't need any patching before it can be run. In executables produced for Windows NT, the default image base is 0x10000. For DLLs, the default is 0x400000. In Windows 95, the address 0x10000 can't be used to load 32-bit EXEs because it lies within a linear address region shared by all processes. Because of this, Microsoft has changed the default base address for Win32 executables to 0x400000. Older programs that were linked assuming a base address of 0x10000 will take longer to load under Windows 95 because the loader needs to apply the base relocations.

On Windows, the default load address for EXEs is 0x400000 - so that's where that part of 0x401000 comes from.
The 0x1000 component is the offset into the image in memory where (usually) the text segment that hold the bulk of the code starts. That's where this particular program's entry point is.
That offset is a field in the PE header, as is indeed the default load address of 0x400000. Both can be changed, but be aware that for EXEs, relocation information is often stripped: Since the default load address is always guaranteed to be free when a new process is first created, relocation information is often assumed to not be needed for EXEs.
If that is the case for your EXE then you can't change the load address without doing major surgery to the image to manually identify and fix up any references that are relative to the assumed 0x400000 load address used during compilation/linking.

How are global variables in shared libraries linked?

Suppose I have shared library with this function where "i" is some global variable.
int foo() {
return i++;
}
When I call this function from multiple processes the value of "i" in each process is independent on the other processes.
This behavior is quite expected.
I was just wondering how is usually this behavior implemented by the linker? From my understanding the code is shared between processes, so the variable has to have the same virtual address in all address spaces of every program that uses this library. That condition seems quite difficult to accomplish to me so I guess I am missing something here and it is done differently.
Can I get some more detailed info on this subject?

The dynamic linking process at run time (much the same as the static linking process), allocates separate data (and bss) segments for each process, and maps those into the process address space. Only the text segments are shared between processes. This way, each process gets its own copy of static data.

the code is shared between processes, so the variable has to have the
same virtual address in all address spaces of every program that uses
this library
The code is not shared the way you think. Yes the dynamic shared object is loaded only once but the memory references or the stack or the heap that code in the so uses is not shared. Only the section that contains the code is shared.

Each process has it's own unique address space, so when a process access the variable it can have different values then the other process. If the process should share the same memory, they would have to specifically set this up. A shared library is not enough for that.

Converting a string into a function in c++

I have been looking for a way to dynamically load functions into c++ for some time now, and I think I have finally figure it out. Here is the plan:
Pass the function as a string into C++ (via a socket connection, a file, or something).
Write the string into file.
Have the C++ program compile the file and execute it. If there are any errors, catch them and return it.
Have the newly executed program with the new function pass the memory location of the function to the currently running program.
Save the location of the function to a function pointer variable (the function will always have the same return type and arguments, so
this simplifies the declaration of the pointer).
Run the new function with the function pointer.
The issue is that after step 4, I do not want to keep the new program running since if I do this very often, many running programs will suck up threads. Is there some way to close the new program, but preserve the memory location where the new function is stored? I do not want it being overwritten or made available to other programs while it is still in use.
If you guys have any suggestions for the other steps as well, that would be appreciated as well. There might be other libraries that do things similar to this, and it is fine to recommend them, but this is the approach I want to look into — if not for the accomplishment of it, then for the knowledge of knowing how to do so.
Edit: I am aware of dynamically linked libraries. This is something I am largely looking into to gain a better understanding of how things work in C++.

I can't see how this can work. When you run the new program it'll be a separate process and so any addresses in its process space have no meaning in the original process.
And not just that, but the code you want to call doesn't even exist in the original process, so there's no way to call it in the original process.
As Nick says in his answer, you need either a DLL/shared library or you have to set up some form of interprocess communication so the original process can send data to the new process to be operated on by the function in question and then sent back to the original process.

How about a Dynamic Link Library?
These can be linked/unlinked/replaced at runtime.
Or, if you really want to communicated between processes, you could use a named pipe.
edit- you can also create named shared memory.

for the step 4. we can't directly pass the memory location(address) from one process to another process because the two process use the different virtual memory space. One process can't use memory in other process.
So you need create a shared memory through two processes. and copy your function to this memory, then you can close the newly process.
for shared memory, if in windows, looks Creating Named Shared Memory
http://msdn.microsoft.com/en-us/library/windows/desktop/aa366551(v=vs.85).aspx
after that, you still create another memory space to copy function to it again.
The idea is that the normal memory allocated only has read/write properties, if execute the programmer on it, the CPU will generate the exception.
So, if in windows, you need use VirtualAlloc to allocate the memory with the flag,PAGE_EXECUTE_READWRITE (http://msdn.microsoft.com/en-us/library/windows/desktop/aa366887(v=vs.85).aspx)
void* address = NULL;
address= VirtualAlloc(NULL,
sizeof(emitcode),
MEM_COMMIT|MEM_RESERVE,
PAGE_EXECUTE_READWRITE);
After copy the function to address, you can call the function in address, but need be very careful to keep the stack balance.

Dynamic library are best suited for your problem. Also forget about launching a different process, it's another problem by itself, but in addition to the post above, provided that you did the virtual alloc correctly, just call your function within the same "loadder", then you shouldn't have to worry since you will be running the same RAM size bound stack.
The real problems are:
1 - Compiling the function you want to load, offline from the main program.
2 - Extract the relevant code from the binary produced by the compiler.
3 - Load the string.
1 and 2 require deep understanding of the entire compiler suite, including compiler flag options, linker, etc ... not just the IDE's push buttons ...
If you are OK, with 1 and 2, you should know why using a std::string or anything but pure char *, is an harmfull.
I could continue the entire story but it definitely deserve it's book, since this is Hacker/Cracker way of doing things I strongly recommand to the normal user the use of dynamic library, this is why they exists.

Usually we call this code injection ...
Basically it is forbidden by any modern operating system to access something for exceution after the initial loading has been done for sake of security, so we must fall back to OS wide validated dynamic libraries.
That's said, one you have valid compiled code, if you realy want to achieve that effect you must load your function into memory then define it as executable ( clear the NX bit ) in a system specific way.
But let's be clear, your function must be code position independant and you have no help from the dynamic linker in order to resolve symbol ... that's the hard part of the job.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js