Discarding DLL's resource after using it

Discarding DLL's resource after using it - c++

I'm creating a dll with an embedded binary resource. Currently when I load this DLL it gets memory mapped into my process address space. The problem is that the embedded binary resource is huge and I don't want to keep it around once I'm done using it.
I tried looking up documentation regarding this and apparently there are sections in the PE file which don't get memory mapped(relocation section). Also, I can create new sections and flag it IMAGE_SCN_MEM_DISCARDABLE but this flag is ignored outside of kernel mode.
There was a win API function which supported freeing resources for 16-bit Windows but doesn't work 32-bit onward. The documentation says "This function is obsolete and is only supported for backward compatibility with 16-bit Windows. For 32-bit Windows applications, it is not necessary to free the resources loaded using LoadResource. If used on 32 or 64-bit Windows systems, this function will return FALSE". I don't know what they mean by that but it seems like they don't expect resources to be huge and can be accommodated in the address space.
Is there any way for me to keep discard the resources I load after I'm done using them?

The system will discard them if it needs to. So long as you are not referring to the memory, it can be discarded and paged out if the system needs the physical memory for something else. So, it won't stop physical memory being used for that which needs it.
That said, linked resources are not intended to be huge. The point is that a module is mapped into a contiguous range of memory. If your module is really huge then it may be impossible to find such a contiguous range of memory. What's more, the module's address range is reserved for the entire lifetime of the resource. That means that nothing else in the process can use that virtual memory address range. So even if a contiguous address range can be found, it is forever reserved for the module and that address range cannot be used for anything else. And this can easily become a problem for 32 bit applications.
So, by putting the huge resource in memory you won't incur a long-standing drain on physical resources, but you will put an unavoidable constraint on virtual memory address space resources.
The conclusion to draw is that such huge objects should be held in external files and not linked to the module as a resource. If you absolutely must use a resource in a PE module, then put the resource into a separate DLL. Load the DLL with LoadLibrary, pull out the resource using the module handle you got from LoadLibrary, and then unload the DLL with FreeLibrary.

Related

Identifying data segment in Win32/Win64

I am a DLL loaded in the memory space of some process. I am part of a number of DLLs that are present in this process, some loaded dynamically and some statically.
There is a "data gem" left for me to discover somewhere in this process's space and we will assume it is in a "data" segment (ie not in some weird self modifying code).
I need to find it. I need to search memory, eg do a memcmp() but I do not know where to start looking. Maybe I can brute force search from 0 to many-gigs but that will throw read-access or execute-only exceptions and maybe I will be able to handle these exceptions so that I do not bring the whole process down. But it sounds dodgy.
Is there a more intelligent way to search ? Off the top of my head, I could look into the data segments of the main process because there is a way to get the address ranges from the NT header somehow, and I do know the process which I have got loaded in. Then I could enumerate all loaded DLLs and look inside their spaces too.
Can anyone please suggest a method or even tell me if I am on the right track?

You can enumerate all the loaded modules in you process via EnumProcessModules using GetCurrentProcess as the process handle. Then for each module you can call GetModuleInformation which will return you a MODULEINFO struct which tells you exactly where in memory the module is loaded and its size. Alternatively you can call GetModuleFileNameEx
and examine the module on disk.
Do note that reading arbitrary memory in a process - even the one you're currently running in - can have issues. For example if another thread is running at the same time as yours then it can affect the module table as you're iterating over it.

After some testing a Win32 process may use memory that it has acquired via a number of methods, I think it all ends up using VirtualAlloc and a bit higher level with HeapCreate et al. In the end the data gem may be in a module's "data" segments, or on a heap, even on a stack - both allocated with VirtualAlloc. There may well be other memory allocation methods.
When we look at a Windows process it will have a bunch of DLLs loaded many of which will be using their own "heap" and/or direct VirtualAlloc calls. Others will be sharing the main process's heap.
I have enumerated the process's heaps using GetProcessHeaps and then HeapWalk concentrating only on PROCESS_HEAP_ENTRY_BUSY and I have, luckily, found what I was looking for. My "heapwalk" is by no means an exhaustive search.
I have not found a way, and it is academic for me now, to link a heap entry (block) to a particular module. Similarly if I were to look into all the VirtualAllocs I would not know how to trace the allocated blocks back to some code running inside a module. But that step is academic.

How do DLLs handle concurrency from multiple processes?

I understand from Eric Lippert's answer that "two processes can share non-private memory pages. If twenty processes all load the same DLL, the processes all share the memory pages for that code. They don't share virtual memory address space, they share memory."
Now, if the same DLL file on the harddisk, after loaded into appliations, would share the same physical memory (be it RAM or page files), but mapped to different virtual memory address spaces, wouldn't that make it quite difficult to handle concurrency?
As I understand, concurrency concept in C++ is more about handling threading -- A process can start multiple threads, each can be run on an individual core, so when different threads calls the DLL at the same time, there might be data racing and we need mutex, lock, signal, conditional variable and so on.
But how would a DLL handles multi-processes? The same concept of data racing will happen, isn't it? What are the tools to handle that? Still the same toolset?

Now, if the same DLL file on the hard disk, after loaded into applications, would share the same physical memory (be it RAM or page files), but mapped to different virtual memory address spaces, wouldn't that make it quite difficult to handle concurrency?
As other answers have noted, the concurrency issues are of no concern if the shared memory is never written after it is initialized, which is typically the case for DLLs. If you are attempting to alter the code or resources in a DLL by writing into memory, odds are good you have a bad pointer somewhere and the best thing to do is to crash with an access violation.
However I wanted to also briefly follow up on your concern:
... mapped to different virtual memory address spaces ...
In practice we try very hard to avoid this happening because when it does, there can be a serious user-noticeable performance problem when loading code pages for the first time. (And of course a possible large increase in working set, which causes other performance problems.)
The code in a DLL often contains hard-coded virtual memory addresses, on the assumption that the code will be loaded into a known-at-compile-time virtual memory "base" address. If this assumption is violated at runtime -- because there's another DLL already there, for example -- then all those hard-coded addresses need to be patched at runtime, which is expensive.
If you want some historical details, see Raymond's article on the subject: https://blogs.msdn.microsoft.com/oldnewthing/20041217-00/?p=36953/

DLL's contain multiple "segments", and each segment has a descriptor telling Windows its Characteristics. This is a 32 bits DWORD. Code segments obviously have the code bit set, and generally also the shareable bit. Read-only data can also be shareable, whereas writeable data generally does not have the shareable flag.
Now you can set an unusual combination of characteristics on an extra segment: writeable and shareable. That is not the default, and indeed might cause race conditions. So the final answer to your question is: the problem is avoided chiefly by the default characteristics of segments, and secondly any DLL which has a segment with non-standard characteristics must deal with the self-inflicted problems.

Are memory modules mapped into process' virtual space?

I see that on Windows the function EnumProcessModules returns a number of modules loaded for a specified process (some of these should be system dlls like guard32.dll, version.dll, etc..)
My question is: are these modules mapped into the process' virtual space? Can I jump to an instruction located into one of these modules (of course knowing the address) from the main app code?

Yes, the DLL's should be mapped into the process virtual address space. The mapping may not be backed by a real physical page if the code in that page has not been executed, and of course executing "random" bits of code without the right initialization or setup for the code to execute properly (e.g calling the processing function that uses some data that needs to be allocated in another function) will clearly end badly in some defintion of bad. Also bear in mind that the DLL may well be loaded at different addresses at different times you run the same code, etc, so you can't rely on the address of the DLL being constant - and it may well be completely different in another machine.

Yes, just call GetProcAddress using the module which you got from EnumProcessModules. GetProcAddress calculates the function offset within the module.

Yes, any DLL code that can be invoked directly from your own executable must be mapped into your process space. You can get a precise chart of your process virtual memory space using SysInternal's VMMap utility: http://technet.microsoft.com/en-us/sysinternals/dd535533
As mentioned in other answers, the virtual address space is largely, if not entirely, dynamic.
There are cases where certain shared libraries are not directly accessible from your process. These are typically sandboxed (secured) kernel or driver libraries, which are invoked through a special secure layer/API that performs parameter validation and then executes a ring/context switch into a different virtual process address space, or passes the command on via a secured inter-thread communication queue. These are expensive operations so they are typically reserved for use only when there are benefits to system stability.

Swapping objects out to file

My C++ application occasionally runs out of memory due to large amounts of data being retrieved from a database. It has to run on 32bit WinXP machines.
Is it possible to transparently (for most of the existing code) swap out the data objects to disk and read them into memory only on demand, so I'm not limited to the 2GB that 32bit Windows gives to the process?
I've looked at VirtualAlloc and Address Window Extensions but I'm not sure it's what I want.
I also found this SO question where the questioner creates a file mapping and wants to create objects in there. One answer suggests using placement new which sounds like it would be pretty transparent to the rest of the code.
Will this prevent my application to run out of physical memory? I'm not entirely sure of it because after all there is still the 32bit address space limit. Or is this a different kind of problem that will occur when trying to create a lot of objects?

So long as you are using a 32-bit operating system there is nothing you can do about this. There is no way to have more than 3GB (2GB in the case of Windows) of data in virtual memory, whether or not it's actually swapped out to disk.
Historically databases have always handled this problem by using read, write and seek. So rather than accessing data directly from memory, they use a fake (64-bit) pointer. Data is split into blocks (normally around 4kb), and a number of these blocks are allocated in memory. When they want to access data from a fake pointer address they check if the block is loaded into memory and if it is they access it from there. If it is not then they find an empty slot and copy it in, then return the address. If there are no slots free then a piece of data will be written back out to disk (if it's been modified) and that slot will be reused.
The real beauty of this is that if your system has enough RAM then the operating system will cache much more than 2GB of this data in RAM at any point in time, and when you feel like you are actually reading and writing from disk the operating system will probably just be copying data around in memory. This, of course, requires a 32-bit operating system that support more than 3GB of physical memory, such as Linux or Windows Server with PAE.
SQLite has a nice self-contained implementation of this, which you could probably make use of with little effort.
If you do not wish to do this then your only alternatives are to either use a 64-bit operating system or to work with less data at any given point in time.

DLL size in memory & size on the hard disk

Is there a relationship between DLL size in memory and size on the hard disk?
This is because I am using Task Manager extension (MS), and I can go to an EXE in the list and right click -> Module, then I can see all the DLLs this EXE is using. It has a Length column, but is it in bytes? And the value (Length) of the DLL seems to be different from the (DLL) size on the hard disk. Why?

There's a relationship, but it's not entirely direct or straightforward.
When your DLL is first used, it gets mapped to memory. That doesn't load it into memory, just allocates some address space in your process where it can/could be loaded when/if needed. Then, individual pages of the DLL get loaded into memory via demand paging -- i.e., when you refer to some of the address space that got allocated, the code (or data) that's mapped to that/those address(es) will be loaded if it's not already in memory.
Now, the address mapping does take up a little space (one 4K page for each megabyte of address space that gets mapped). Of course, when you load some data into memory, that uses up memory too.
Note, however, that most pages can/will be shared between processes too, so if your DLL was used by 5 different processes at once, it would be mapped 5 times (i.e., once to each process that used it) but there would still only be one physical copy in memory (at least normally).
Between those, it can be a little difficult to even pin down exactly what you mean by the memory consumption of a particular DLL.

There are two parts that come into play in determining the size of a dll in memory:
As everyone else pointed out, dll's get memory mapped, this leads to thier size being page aligned (on of the reasons preferred load addresses from back in the day had to be page aligned). generally, page alignment is 4Kb for 32bit systems, 8Kb for 64 bit systems (for a more indepth look at this on windows, see this).
Dll's contain a segment for uninitialized data, on disk this segment is compressed, generally to a base + size, when the dll is loaded and initialized, the space for the .bss segment gets allocated, increasing its size. Generally this a small and will be absored by the page alignment, but if a dll contains huge static buffers, this can balloon its virtualized size.

The memory footprint will usually be bigger than on disk size because when it is mapped into memory it is page aligned. Standard page sizes are 4KB and 8KB so if your dll is 1KB of code its still going to use 4KB in memory.

Don't think of a .dll or a .exe as something that gets copied into memory to be executed.
Think of it as a set of instructions for the loader.
Sure it contains the program and static data text.
More importantly, it contains all the information allowing that text to be relocated, and to have all its unsatisfied references hooked up, and to export references that other modules may need.
Then if there's symbol and line number information for debugging, that's still more text.
So in general you would expect it to be larger than the memory image.

It all depends on what you call "memory", and what exactly does your TaskManager extension show.
Every executable module (Exe/Dll) is mapped into an address space. The size of this mapping equals to its size. And, I guess, this is what your "extension" shows to you.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js