DLL loading and system image space

DLL loading and system image space - c++

DLL’s are only ever really loaded once. The dynamic loader will link and redirect calls if your app starts using a specific DLL, something from MS-Office for example.
However, WHEN does the repeated referencing of a DLL for various different users and apps, on the system push a DLL image into system space, so that ALL apps can use it?
Otherwise, does the loaded image remain in the user space?
Bearing in mind: All apps actually look at the SAME 2gb system space, and this is virtualized for them by virtual addressing,
OR, Does the linker always load DLLS into the Kernel space, so that all apps can use them.

DLL’s are only ever really loaded once.
This is not correct. They are mapped into the virtual address space either when the process starts by the loader of the operating system, or when you ask for it through API functions like LoadLibrary. Each process gets a fresh copy and the DLL is initialized each time this happens.
There is no global "system space" which all processes use at once. Each process has their own private virtual address range (which is 4GB with normally 2GB usable memory on 32 bit Windows). If you overwrite parts of a DLL in your own virtual memory, copies of the DLL in other processes are not affected. One process could easily crash the whole system if it weren't like this.

Related

How are DLLs mapped into current programs virtual address space

When I load a DLL in program, how does that occur in memory? Does it get loaded into my Virtual Address Space? If it does, where are the text and data segments stored? I have a 32-bit program I'm maintaining, which uses a large part of the available heap for image processing routines, and I want to know how much I should worry about loading DLLs which themselves might use a lot of space.

Yes: everything that your process needs to access must be in its adress space. This applies to your code and to your data as well.
Here you'll find more about the anatomy of process memory and adress space
and here it's explained that dll are loaded into the virtual adress space.
Remark: the dll might be shared between several processes: it is then loaded only once in memory by the OS. But every process using it could potentially see it at a different place in its own virtual adress space (see also this SO answer about relative virtual adresses).

Are shared objects/DLLs loaded by different processes into different areas of memory?

I'm trying to figure out how an operating system handles multiple unrelated processes loading the same DLL/shared library. The OSes I'm concerned with are Linux and Windows, but to a lesser extent Mac as well. I presume the answers to my questions will be identical for all operating systems.
I'm particularly interested in regards to explicit linking, but I'd also like to know for implicit linking. I presume the answers for both will also be identical.
This is the best explanation I've found so far concerning Windows:
"The system maintains a per-process reference count on all loaded modules. Calling LoadLibrary increments the reference count. Calling the FreeLibrary or FreeLibraryAndExitThread function decrements the reference count. The system unloads a module when its reference count reaches zero or when the process terminates (regardless of the reference count)." - http://msdn.microsoft.com/en-us/library/windows/desktop/ms684175%28v=vs.85%29.aspx
But it leaves some questions.
1.) Do unrelated processes load the same DLL redundantly (that is, the DLL exists more than once in memory) instead of using reference counting? ( IE, into each process's own "address space" as I think I understand it )
if the DLL is unloaded as soon as a process is terminated, that leads me to believe the other processes using exact same DLL will have a redundantly loaded into memory, otherwise the system should not be allowed to ignore the reference count.
2.) if that is true, then what's the point of reference counting DLLs when you load them multiple times in the same process? What would be the point of loading the same DLL twice into the same process? The only feasible reason I can come up with is that if an EXE references two DLLs, and one of the DLLs references the other, there will be at least two LoadLibrar() and two FreeLibrary() calls for the same library.
I know it seems like I'm answering my own questions here, but I'm just postulating. I'd like to know for sure.

The shared library or DLL will be loaded once for the code part, and multiple times for any writeable data parts [possibly via "copy-on-write", so if you have a large chunk of memory which is mostly read, but some small parts being written, all the DLL's can use the same pieces as long as they haven't been changed from the original value].
It is POSSIBLE that a DLL will be loaded more than once, however. When a DLL is loaded, it is loaded a base-address, which is where the code starts. If we have some process, which is using, say, two DLL's that, because of their previous loading, use the same base-address [because the other processes using this doesn't use both], then one of the DLL's will have to be loaded again at a different base-address. For most DLL's this is rather unusual. But it can happen.
The point of referencecounting every load is that it allows the system to know when it is safe to unload the module (when the referencecount is zero). If we have two distinct parts of the system, both wanting to use the same DLL, and they both load that DLL, you don't really want to cause the system to crash when the first part of the system closes the DLL. But we also don't want the DLL to stay in memory when the second part of the system has closed the DLL, because that would be a waste of memory. [Imagine that this application is a process that runs on a server, and new DLL's are downloaded every week from a server, so each week, the "latest" DLL (which has a different name) is loaded. After a few months, you'd have the entire memory full of this applications "old, unused" DLL's]. There are of course also scenarios such as what you describe, where a DLL loads another DLL using the LoadLibrary call, and the main executable loads the very same DLL. Again, you do need two FreeLibrary calls to close it.

Are memory modules mapped into process' virtual space?

I see that on Windows the function EnumProcessModules returns a number of modules loaded for a specified process (some of these should be system dlls like guard32.dll, version.dll, etc..)
My question is: are these modules mapped into the process' virtual space? Can I jump to an instruction located into one of these modules (of course knowing the address) from the main app code?

Yes, the DLL's should be mapped into the process virtual address space. The mapping may not be backed by a real physical page if the code in that page has not been executed, and of course executing "random" bits of code without the right initialization or setup for the code to execute properly (e.g calling the processing function that uses some data that needs to be allocated in another function) will clearly end badly in some defintion of bad. Also bear in mind that the DLL may well be loaded at different addresses at different times you run the same code, etc, so you can't rely on the address of the DLL being constant - and it may well be completely different in another machine.

Yes, just call GetProcAddress using the module which you got from EnumProcessModules. GetProcAddress calculates the function offset within the module.

Yes, any DLL code that can be invoked directly from your own executable must be mapped into your process space. You can get a precise chart of your process virtual memory space using SysInternal's VMMap utility: http://technet.microsoft.com/en-us/sysinternals/dd535533
As mentioned in other answers, the virtual address space is largely, if not entirely, dynamic.
There are cases where certain shared libraries are not directly accessible from your process. These are typically sandboxed (secured) kernel or driver libraries, which are invoked through a special secure layer/API that performs parameter validation and then executes a ring/context switch into a different virtual process address space, or passes the command on via a secured inter-thread communication queue. These are expensive operations so they are typically reserved for use only when there are benefits to system stability.

is it possible for function address in DLL to change if it loaded to application?

I disassembled a DLL and see there some functions. I found the function that I need and it's address is 0x10001340.
Would this address stay the same, if I load this dll into my application? So would it be possible for me to call that function by it's address from my application?
I am asking because I am not sure: what if when this dll loaded, some function in the main application already has the same address? So maybe the functions inside a dll can change addresses when loading or etc.

On Windows dlls have a preferential load address, but the loader is able to change all those references if it notices that such portion of the virtual address space is already used. This process is called "rebasing".
The "default" base address is specified at linking time (/BASE with the Microsoft linker), and it can be useful to set it to something different than the default if you plan to use the dll alongside with another one with the same base address; this speeds up the loading process, since the loader doesn't have to rebase one of the dlls at each load. (IIRC there are also tools that are able to rebase an existing dll and save the result on disk)
It's good to keep in mind that, from Windows Vista onwards, dlls compiled with a specified flag are loaded always at a random base address to avoid some kind of exploits.

It is extremely unlikely that you'll end up with the same address. The default /BASE argument for the linker for DLLs is 0x10000000, that's how your entrypoint ended up at that address. But there are many DLLs that are linked using the default setting, only one can actually get loaded at that address. All the other ones that get loaded later need to be re-based.
You could come up with a better value for /BASE, it is however never a guarantee that you get the load address you ask for.

As Matteo said, a DLL has a preferred load address (specified in the ImageBase field of the IMAGE_OPTIONAL_HEADER structure). When the system tries to load a DLL it will load it at this address if possible (unless address space randomisation is enabled) and no "patching" is required. If it can't load at the preferred address the DLL is relocated which will require any absolute references in the DLL to be patched to compensate for the relocation.
So to answer your question:
There is no guarantee that a DLL will be loaded at its preferred address. Once loaded subsequent loads will not load more copies of the DLL so the addresses will not change. However once unloaded (DLLs are reference counted) there is no guarantee it will be loaded at the same address next time.

Is a DLL loaded entirely or only some functions?

When a program uses a dynamic shared library, does it load the DLL entirely (so you can almost erase the DLL from disk during application is running) or does it load only a part of the DLL according to its need at each time during the runtime life of the application?

DLL gets loaded entirely. DLLs are same as EXEs in almost all aspect; the only big difference between them is, DLLs are not executable. It doesn't have main() function - the start of a program.

I don't know how the details work in Windows (in Linux I know the responsible code in the kernel quite well), but at least in *nix systems deleting a filesystem entry leaves the file contents intact as long there are file descriptor/handles opened on it.; only after closing the last file descriptor/handle the blocks on the storage device may get overwritten. Windows is POSIX certified, so it follows this behaviour.
DLLs are not loaded into preallocated memory. They're memory mapped. This causes kind of the reverse of swap memory. Instead of swapping RAM to a disk, the contents of the file are mapped into process address space and will end up in RAM through disk/file cache. The same goes for shared objects in *nix operating systems. But there are significant differences between Windows and *nix systems deal with relocations, symbol exports and so on.

It's being loaded entirely, as was pointed out. The special part is not that you can't run the DLL, it's that the memory pages of a DLL are usually shared across process boundaries.
Should a process attempt to write into a page, a copy of that page is taken and the copy is only visible to this process (it's called copy-on-write).
DLLs are PE files (i.e. the same as NT drivers or Win32 programs). They are loaded similarly to .exe files into Memory Mapped Files (MMFs, or "sections" in kernel mode parlance). This means that the DLL file is backing the MMF that represents the loaded DLL. This is the same as when passing a valid file handle (not INVALID_HANDLE_VALUE) to CreateFileMapping and it's also (part of) the reason why you can't delete the DLL while it is in use.
Also, there are some DLLs that contain no code at all. Such a DLL can then also be loaded into a process that was not made for the same processor. E.g. the x86 resource DLL loads fine into an x64 application.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js