Are memory modules mapped into process' virtual space? - c++

I see that on Windows the function EnumProcessModules returns a number of modules loaded for a specified process (some of these should be system dlls like guard32.dll, version.dll, etc..)
My question is: are these modules mapped into the process' virtual space? Can I jump to an instruction located into one of these modules (of course knowing the address) from the main app code?

Yes, the DLL's should be mapped into the process virtual address space. The mapping may not be backed by a real physical page if the code in that page has not been executed, and of course executing "random" bits of code without the right initialization or setup for the code to execute properly (e.g calling the processing function that uses some data that needs to be allocated in another function) will clearly end badly in some defintion of bad. Also bear in mind that the DLL may well be loaded at different addresses at different times you run the same code, etc, so you can't rely on the address of the DLL being constant - and it may well be completely different in another machine.

Yes, just call GetProcAddress using the module which you got from EnumProcessModules. GetProcAddress calculates the function offset within the module.

Yes, any DLL code that can be invoked directly from your own executable must be mapped into your process space. You can get a precise chart of your process virtual memory space using SysInternal's VMMap utility: http://technet.microsoft.com/en-us/sysinternals/dd535533
As mentioned in other answers, the virtual address space is largely, if not entirely, dynamic.
There are cases where certain shared libraries are not directly accessible from your process. These are typically sandboxed (secured) kernel or driver libraries, which are invoked through a special secure layer/API that performs parameter validation and then executes a ring/context switch into a different virtual process address space, or passes the command on via a secured inter-thread communication queue. These are expensive operations so they are typically reserved for use only when there are benefits to system stability.

Related

How do DLLs handle concurrency from multiple processes?

I understand from Eric Lippert's answer that "two processes can share non-private memory pages. If twenty processes all load the same DLL, the processes all share the memory pages for that code. They don't share virtual memory address space, they share memory."
Now, if the same DLL file on the harddisk, after loaded into appliations, would share the same physical memory (be it RAM or page files), but mapped to different virtual memory address spaces, wouldn't that make it quite difficult to handle concurrency?
As I understand, concurrency concept in C++ is more about handling threading -- A process can start multiple threads, each can be run on an individual core, so when different threads calls the DLL at the same time, there might be data racing and we need mutex, lock, signal, conditional variable and so on.
But how would a DLL handles multi-processes? The same concept of data racing will happen, isn't it? What are the tools to handle that? Still the same toolset?
Now, if the same DLL file on the hard disk, after loaded into applications, would share the same physical memory (be it RAM or page files), but mapped to different virtual memory address spaces, wouldn't that make it quite difficult to handle concurrency?
As other answers have noted, the concurrency issues are of no concern if the shared memory is never written after it is initialized, which is typically the case for DLLs. If you are attempting to alter the code or resources in a DLL by writing into memory, odds are good you have a bad pointer somewhere and the best thing to do is to crash with an access violation.
However I wanted to also briefly follow up on your concern:
... mapped to different virtual memory address spaces ...
In practice we try very hard to avoid this happening because when it does, there can be a serious user-noticeable performance problem when loading code pages for the first time. (And of course a possible large increase in working set, which causes other performance problems.)
The code in a DLL often contains hard-coded virtual memory addresses, on the assumption that the code will be loaded into a known-at-compile-time virtual memory "base" address. If this assumption is violated at runtime -- because there's another DLL already there, for example -- then all those hard-coded addresses need to be patched at runtime, which is expensive.
If you want some historical details, see Raymond's article on the subject: https://blogs.msdn.microsoft.com/oldnewthing/20041217-00/?p=36953/
DLL's contain multiple "segments", and each segment has a descriptor telling Windows its Characteristics. This is a 32 bits DWORD. Code segments obviously have the code bit set, and generally also the shareable bit. Read-only data can also be shareable, whereas writeable data generally does not have the shareable flag.
Now you can set an unusual combination of characteristics on an extra segment: writeable and shareable. That is not the default, and indeed might cause race conditions. So the final answer to your question is: the problem is avoided chiefly by the default characteristics of segments, and secondly any DLL which has a segment with non-standard characteristics must deal with the self-inflicted problems.

Discarding DLL's resource after using it

I'm creating a dll with an embedded binary resource. Currently when I load this DLL it gets memory mapped into my process address space. The problem is that the embedded binary resource is huge and I don't want to keep it around once I'm done using it.
I tried looking up documentation regarding this and apparently there are sections in the PE file which don't get memory mapped(relocation section). Also, I can create new sections and flag it IMAGE_SCN_MEM_DISCARDABLE but this flag is ignored outside of kernel mode.
There was a win API function which supported freeing resources for 16-bit Windows but doesn't work 32-bit onward. The documentation says "This function is obsolete and is only supported for backward compatibility with 16-bit Windows. For 32-bit Windows applications, it is not necessary to free the resources loaded using LoadResource. If used on 32 or 64-bit Windows systems, this function will return FALSE". I don't know what they mean by that but it seems like they don't expect resources to be huge and can be accommodated in the address space.
Is there any way for me to keep discard the resources I load after I'm done using them?
The system will discard them if it needs to. So long as you are not referring to the memory, it can be discarded and paged out if the system needs the physical memory for something else. So, it won't stop physical memory being used for that which needs it.
That said, linked resources are not intended to be huge. The point is that a module is mapped into a contiguous range of memory. If your module is really huge then it may be impossible to find such a contiguous range of memory. What's more, the module's address range is reserved for the entire lifetime of the resource. That means that nothing else in the process can use that virtual memory address range. So even if a contiguous address range can be found, it is forever reserved for the module and that address range cannot be used for anything else. And this can easily become a problem for 32 bit applications.
So, by putting the huge resource in memory you won't incur a long-standing drain on physical resources, but you will put an unavoidable constraint on virtual memory address space resources.
The conclusion to draw is that such huge objects should be held in external files and not linked to the module as a resource. If you absolutely must use a resource in a PE module, then put the resource into a separate DLL. Load the DLL with LoadLibrary, pull out the resource using the module handle you got from LoadLibrary, and then unload the DLL with FreeLibrary.

DLL loading and system image space

DLL’s are only ever really loaded once. The dynamic loader will link and redirect calls if your app starts using a specific DLL, something from MS-Office for example.
However, WHEN does the repeated referencing of a DLL for various different users and apps, on the system push a DLL image into system space, so that ALL apps can use it?
Otherwise, does the loaded image remain in the user space?
Bearing in mind: All apps actually look at the SAME 2gb system space, and this is virtualized for them by virtual addressing,
OR, Does the linker always load DLLS into the Kernel space, so that all apps can use them.
DLL’s are only ever really loaded once.
This is not correct. They are mapped into the virtual address space either when the process starts by the loader of the operating system, or when you ask for it through API functions like LoadLibrary. Each process gets a fresh copy and the DLL is initialized each time this happens.
There is no global "system space" which all processes use at once. Each process has their own private virtual address range (which is 4GB with normally 2GB usable memory on 32 bit Windows). If you overwrite parts of a DLL in your own virtual memory, copies of the DLL in other processes are not affected. One process could easily crash the whole system if it weren't like this.

Are shared objects/DLLs loaded by different processes into different areas of memory?

I'm trying to figure out how an operating system handles multiple unrelated processes loading the same DLL/shared library. The OSes I'm concerned with are Linux and Windows, but to a lesser extent Mac as well. I presume the answers to my questions will be identical for all operating systems.
I'm particularly interested in regards to explicit linking, but I'd also like to know for implicit linking. I presume the answers for both will also be identical.
This is the best explanation I've found so far concerning Windows:
"The system maintains a per-process reference count on all loaded modules. Calling LoadLibrary increments the reference count. Calling the FreeLibrary or FreeLibraryAndExitThread function decrements the reference count. The system unloads a module when its reference count reaches zero or when the process terminates (regardless of the reference count)." - http://msdn.microsoft.com/en-us/library/windows/desktop/ms684175%28v=vs.85%29.aspx
But it leaves some questions.
1.) Do unrelated processes load the same DLL redundantly (that is, the DLL exists more than once in memory) instead of using reference counting? ( IE, into each process's own "address space" as I think I understand it )
if the DLL is unloaded as soon as a process is terminated, that leads me to believe the other processes using exact same DLL will have a redundantly loaded into memory, otherwise the system should not be allowed to ignore the reference count.
2.) if that is true, then what's the point of reference counting DLLs when you load them multiple times in the same process? What would be the point of loading the same DLL twice into the same process? The only feasible reason I can come up with is that if an EXE references two DLLs, and one of the DLLs references the other, there will be at least two LoadLibrar() and two FreeLibrary() calls for the same library.
I know it seems like I'm answering my own questions here, but I'm just postulating. I'd like to know for sure.
The shared library or DLL will be loaded once for the code part, and multiple times for any writeable data parts [possibly via "copy-on-write", so if you have a large chunk of memory which is mostly read, but some small parts being written, all the DLL's can use the same pieces as long as they haven't been changed from the original value].
It is POSSIBLE that a DLL will be loaded more than once, however. When a DLL is loaded, it is loaded a base-address, which is where the code starts. If we have some process, which is using, say, two DLL's that, because of their previous loading, use the same base-address [because the other processes using this doesn't use both], then one of the DLL's will have to be loaded again at a different base-address. For most DLL's this is rather unusual. But it can happen.
The point of referencecounting every load is that it allows the system to know when it is safe to unload the module (when the referencecount is zero). If we have two distinct parts of the system, both wanting to use the same DLL, and they both load that DLL, you don't really want to cause the system to crash when the first part of the system closes the DLL. But we also don't want the DLL to stay in memory when the second part of the system has closed the DLL, because that would be a waste of memory. [Imagine that this application is a process that runs on a server, and new DLL's are downloaded every week from a server, so each week, the "latest" DLL (which has a different name) is loaded. After a few months, you'd have the entire memory full of this applications "old, unused" DLL's]. There are of course also scenarios such as what you describe, where a DLL loads another DLL using the LoadLibrary call, and the main executable loads the very same DLL. Again, you do need two FreeLibrary calls to close it.

How can you track memory across DLL boundaries

I want performant run-time memory metrics so I wrote a memory tracker based on overloading new & delete. It basically lets walk your allocations in the heap and analyze everything about them - fragmentation, size, time, number, callstack, etc. But, it has 2 fatal flaws: It can't track memory allocated in other DLLs and when ownership of objects is passed to DLLs or vice versa crashes ensue. And some smaller flaws: If a user uses malloc instead of new it's untracked; or if a user makes a class defined new/delete.
How can I eliminate these flaws? I think I must be going about this fundamentally incorrectly by overloading new/delete, is there a better way?
The right way to implement this is to use detours and a separate tool that runs in its own process. The procedure is roughly the following:
Create memory allocation in a remote process.
Place there code of a small loader that will load your dll.
Call CreateRemoteThread API that will run your loader.
From inside of the loaded dll establish detours (hooks, interceptors) on the alloc/dealloc functions.
Process the calls, track activity.
If you implement your tool this way, it will be not important from what DLL or directly from exe the memory allocation routines are called. Plus you can track activities from any process, not necessarily that you compiled yourself.
MS Windows allows checking contents of the virtual address space of the remote process. You can summarize use of virtual address space that was collected this way in a histogram, like the following:
From this picture you can see how many virtual allocation of what size are existing in your target process.
The picture above shows an overview of the virtual address space usage in 32-bit MSVC DevEnv. Blue stripe means a commited piece of emory, magenta stripe - reserved. Green is unoccupied part of the address space.
You can see that lower addresses are pretty fragmented, while the middle area - not. Blue lines at high addresses - various dlls that are loaded into the process.
You should find out the common memory management routines that are called by new/delete and malloc/free, and intercept those. It is usually malloc/free in the end, but check to make sure.
On UNIX, I would use LD_PRELOAD with some library that re-implemented those routines. On Windows, you have to hack a little bit, but this link seems to give a good description of the process. It basically suggests that you use Detours from Microsoft Research.
Passing ownership of objects between modules is fundamentally flawed. It showed up with your custom allocator, but there are plenty of other cases that will fail also:
compiler upgrades, and recompiling only some DLLs
mixing compilers from different vendors
statically linking the runtime library
Just to name a few. Free every object from the same module that allocated it (often by exporting a deletion function, such as IUnknown::Release()).