DLL size in memory & size on the hard disk - c++

Is there a relationship between DLL size in memory and size on the hard disk?
This is because I am using Task Manager extension (MS), and I can go to an EXE in the list and right click -> Module, then I can see all the DLLs this EXE is using. It has a Length column, but is it in bytes? And the value (Length) of the DLL seems to be different from the (DLL) size on the hard disk. Why?

There's a relationship, but it's not entirely direct or straightforward.
When your DLL is first used, it gets mapped to memory. That doesn't load it into memory, just allocates some address space in your process where it can/could be loaded when/if needed. Then, individual pages of the DLL get loaded into memory via demand paging -- i.e., when you refer to some of the address space that got allocated, the code (or data) that's mapped to that/those address(es) will be loaded if it's not already in memory.
Now, the address mapping does take up a little space (one 4K page for each megabyte of address space that gets mapped). Of course, when you load some data into memory, that uses up memory too.
Note, however, that most pages can/will be shared between processes too, so if your DLL was used by 5 different processes at once, it would be mapped 5 times (i.e., once to each process that used it) but there would still only be one physical copy in memory (at least normally).
Between those, it can be a little difficult to even pin down exactly what you mean by the memory consumption of a particular DLL.

There are two parts that come into play in determining the size of a dll in memory:
As everyone else pointed out, dll's get memory mapped, this leads to thier size being page aligned (on of the reasons preferred load addresses from back in the day had to be page aligned). generally, page alignment is 4Kb for 32bit systems, 8Kb for 64 bit systems (for a more indepth look at this on windows, see this).
Dll's contain a segment for uninitialized data, on disk this segment is compressed, generally to a base + size, when the dll is loaded and initialized, the space for the .bss segment gets allocated, increasing its size. Generally this a small and will be absored by the page alignment, but if a dll contains huge static buffers, this can balloon its virtualized size.

The memory footprint will usually be bigger than on disk size because when it is mapped into memory it is page aligned. Standard page sizes are 4KB and 8KB so if your dll is 1KB of code its still going to use 4KB in memory.

Don't think of a .dll or a .exe as something that gets copied into memory to be executed.
Think of it as a set of instructions for the loader.
Sure it contains the program and static data text.
More importantly, it contains all the information allowing that text to be relocated, and to have all its unsatisfied references hooked up, and to export references that other modules may need.
Then if there's symbol and line number information for debugging, that's still more text.
So in general you would expect it to be larger than the memory image.

It all depends on what you call "memory", and what exactly does your TaskManager extension show.
Every executable module (Exe/Dll) is mapped into an address space. The size of this mapping equals to its size. And, I guess, this is what your "extension" shows to you.

Related

What is up with memory mapped files and actual memory usage?

Cant really find any specifics on this, heres all I know about mmf's in windows:
Creating a memory mapped file in windows adds nothing to the apparent amount of memory a program uses
Creating a view to that file consumes memory equivalent to the view size
This looks rather backwards to me, since for one, I know that the mmf itself actually has memory...somewhere. If I write something in a mmf and destroy the view, the data is still there. Meanwhile, why does the view take any memory at all? Its just a pointer, no?
Then theres the weirdness with whats actually in the ram and whats on the disk. In large mmf's with a distributed looking access pattern, sometimes the speed is there and sometimes its not. I'm guessing some of it gets sometimes stored in the file if one is tied to it or the paging file but really, I have no clue.
Anyways, the problem that drove me to investigate this is that I have a ~2gb file that I want multiple programs to share. I can't create a 2gb view in each of them since I'm just "out of memory" so I have to create/destroy smaller ones. This creates a lot of overhead due to additional offset calculations and the creation of the view itself. Can anybody explain to me why it is like this?
On a demand-paged virtual memory operating system like Windows, the view of an MMF occupies address space. Just numbers to the processor, one for each 4096 bytes. You only start using RAM until you actually use the view. Reading or writing data. At which point you trigger a page fault and force the OS to map the virtual memory page to physical memory. The "demand-paged" part.
You can't get a single chunk of 2 GB of address space in a 32-bit process since there would not be room for anything else. The limit is the largest hole in the address space between other allocations for code and data, usually hovers around ~650 megabytes, give or take. You'll need to target x64. Or build an x86 program that's linked with /LARGEADDRESSAWARE and runs on a 64-bit operating system. A backdoor which is getting to be pretty pointless these days.
The thing in memory mapped file is that it lets you manipulate its data without I/O calls. Because of this behavior, when you access the file, windows loads it to the physical memory, so it can be manipulated in it rather than on the disk. You can read more about this in here: http://blogs.msdn.com/b/khen1234/archive/2006/01/30/519483.aspx
Anyways, the problem that drove me to investigate this is that I have a ~2gb file that I want multiple programs to share. I can't create a 2gb view in each of them since I'm just "out of memory" so I have to create/destroy smaller ones.
The most likely cause is that the programs are 32-bit. 32-bit programs (by default) only have 2GB of address space so you can't map a 2GB file in a single view. If you rebuild them in 64-bit mode, the problem should go away.

How are DLLs mapped into current programs virtual address space

When I load a DLL in program, how does that occur in memory? Does it get loaded into my Virtual Address Space? If it does, where are the text and data segments stored? I have a 32-bit program I'm maintaining, which uses a large part of the available heap for image processing routines, and I want to know how much I should worry about loading DLLs which themselves might use a lot of space.
Yes: everything that your process needs to access must be in its adress space. This applies to your code and to your data as well.
Here you'll find more about the anatomy of process memory and adress space
and here it's explained that dll are loaded into the virtual adress space.
Remark: the dll might be shared between several processes: it is then loaded only once in memory by the OS. But every process using it could potentially see it at a different place in its own virtual adress space (see also this SO answer about relative virtual adresses).

Memory usage in C++ program, as reported by Gnome resource monitor: confusion

I am looking at the memory consumed by my app to make sure I am not allocating too much, and am confused as to what Gnome Resource Monitor is showing me. I have used the following pieces of code to allocate memory in two separate apps that are otherwise identical; they contain nothing other than this code and a scanf() call to pause execution whilst I grab the memory usage:
malloc(1024 * 1024 * 100);
and
char* p = new char[1204*1024*100];
The following image shows the memory usage of my app before and after each of these lines:
Now, I have read a lot (but obviously not enough) about memory usage (including this SO question), and am having trouble differentiating between writeable memory and virtual memory. According to the linked question,
"Writeable memory is the amount of address space that your process has
allocated with write privileges"
and
"Virtual memory is the address space that your application has
allocated"
1) If I have allocated memory myself, surely it has write privileges?
2) The linked question also states (regarding malloc)
"...which won't actually allocate any memory. (See the rant at the end
of the malloc(3) page for details.)"
I don't see any "rant", and my images show the virtual memory has increased! Can someone explain this please?
3) If I have purely the following code:
char* p = new char[100];
...the resource monitor shows that both Memory and Writeable Memory have increased by 8KB - the same as when I was allocating a full one megabyte! - with Virtual memory increasing by 0.1. What is happening here?
4) What column should I be looking at in the resource monitor to see how much memory my app is using?
Thanks very much in advance for participation, and sorry if have been unclear or missed anything that could have led me to find answers myself.
A more precise way to understand on Linux the memory usage of a running process is to use the proc(5) file system.
So, if your process pid is 1234, try
cat /proc/1234/maps
Notice that processes are having their address space in virtual memory. That address space can be changed by mmap(2) and other syscalls(2). For several efficency reasons malloc(3) and free avoid to make too much of these syscalls, and prefer to re-use previously free-d memory zones. So when your program is free-ing (or, in C++, delete-ing) some memory chunk, that chunk is often marked as re-usable but is not released back to the kernel (by e.g. munmap). Likewise, if you malloc only 100 bytes, your libc is allowed to e.g. request a whole megabyte using mmap (the next time you are calling malloc for e.g. 200 bytes, it will use part of that magabyte)
See also http://linuxatemyram.com/ and Advanced Linux Programming (and this question about memory overcommit)
The classes of memory reported by the Gnome resource monitor (and in fact, the vast majority of resource reporting tools) are not simply separate classes of memory - there is overlap between them because they are reporting on different characteristics of the memory. Some of those different characteristics include:
virtual vs physical - all memory in a processes address space on modern operating systems is virtual; that virtual address space is mapped to actual physical memory by the hardware capabilities of the CPU; how that mapping is done is a complex topic in itself, with a lot of differences between different architectures
memory access permissions - memory can be readable, writable, or executable, or any combination of the three (in theory - some combinations don't really make sense and so may actually not be allowed by hardware and/or software, but the point is that these permissions are treated separately)
resident vs non-resident - with a virtual memory system, much of the address space of a process may not actually be currently mapped to real physical memory, for a variety of reasons - it may not have been allocated yet; it may be part of the binary or one of the libraries, or even a data segment that has not yet been loaded because the program has not called for it yet; it may have been swapped out to a swap area to free up physical memory for a different program that needed it
shared vs private - parts of a processes virtual address space that are read-only (for example, the actual code of the program and most of the libraries) may be shared with other processes that use the same libraries or program - this is a big advantage for overall memory usage, as having 37 different xterm instances running does not mean that the code for xterm needs to be loaded 37 different times into memory - all the processes can share one copy of the code
Because of these, and a few other factors (IPC shared memory, memory-mapped files, physical devices that have memory regions mapped in hardware, etc.), determining the actual memory in use by any single process, or even the entire system, can be complicated.

DLL caching issues

From the highest possible performance point of view, does the static vs dynamic library linking option have also impact on performance because of the higher cache-miss ratio for DLL?
My idea is, when a library is statically linked, whole program is loaded on one place or nearby. But when dynamically linked, DLL can be loaded somewhere and it's variables can be allocated "too far".
Is it true, or there's no performance penalty for a DLL in terms of cache miss ratio? (fast C/C++ code only)
"whole program is loaded on once place": your system's memory manager will still map executable memory pages onto physical memory to it's liking - you don't control that. At run-time, physical pages will be swapped out to disk if other portions of your executable code are needed.
Using a shared library may reduce the number of code pages needed in physical memory when multiple processes can actually share the library.
Summarizing:
NO: dynamic or static linkage does not influence cache-misses directly. Dynamic linkage may reduce cache misses for highly reused libraries.
I'd say profile it first!
Physical location does not influence access time. The address space only seems linear but could be virtually mapped to any physical memory page.
You'd need to custom allocation and VirtualLock to get some control over physical location of pages.
Notes
Usually using shared DLLs mitigates the problem you outlined precisely by sharing pages with other processes that have the same image mapped. This leads to fewer pages cached and less need to swap these.
I'd say that the datasegment is not in fact mapped but rather allocated from the processes' address private space so the locality could be similar to statically linked datasegments. You could try to use a heap debugger/visualizer to find out how that works).
If you want a simple means to get full control, simply allocate all things from the HEAP - using your preferred allocation scheme. If there is static data from a DLL, just copy it into that area?
Memory doesn't need to be contiguous for good cache performance. The cache line size, which ranges from a few bytes to a few hundred, is typically much smaller than a DLL.

How much memory should you be able to allocate?

Background: I am writing a C++ program working with large amounts of geodata, and wish to load large chunks to process at a single go. I am constrained to working with an app compiled for 32 bit machines. The machine I am testing on is running a 64 bit OS (Windows 7) and has 6 gig of ram. Using MS VS 2008.
I have the following code:
byte* pTempBuffer2[3];
try
{
//size_t nBufSize = nBandBytes*m_nBandCount;
pTempBuffer2[0] = new byte[nBandBytes];
pTempBuffer2[1] = new byte[nBandBytes];
pTempBuffer2[2] = new byte[nBandBytes];
}
catch (std::bad_alloc)
{
// If we didn't get the memory just don't buffer and we will get data one
// piece at a time.
return;
}
I was hoping that I would be able to allocate memory until the app reached the 4 gigabyte limit of 32 bit addressing. However, when nBandBytes is 466,560,000 the new throws std::bad_alloc on the second try. At this stage, the working set (memory) value for the process is 665,232 K So, it I don't seem to be able to get even a gig of memory allocated.
There has been some mention of a 2 gig limit for applications in 32 bit Windows which may be extended to 3 gig with the /3GB switch for win32. This is good advice under that environment, but not relevant to this case.
How much memory should you be able to allocate under the 64 bit OS with a 32 bit application?
As much as the OS wants to give you. By default, Windows lets a 32-bit process have 2GB of address space. And this is split into several chunks. One area is set aside for the stack, others for each executable and dll that is loaded. Whatever is left can be dynamically allocated, but there's no guarantee that it'll be one big contiguous chunk. It might be several smaller chunks of a couple of hundred MB each.
If you compile with the LargeAddressAware flag, 64-bit Windows will let you use the full 4GB address space, which should help a bit, but in general,
you shouldn't assume that the available memory is contiguous. You should be able to work with multiple smaller allocations rather than a few big ones, and
You should compile it as a 64-bit application if you need a lot of memory.
on windows 32 bit, the normal process can take 2 GB at maximum, but with /3GB switch it can reach to 3 GB (for windows 2003).
but in your case I think you are allocating contiguous memory, and so the exception occured.
You can allocate as much memory as your page file will let you - even without the /3GB switch, you can allocate 4GB of memory without much difficulty.
Read this article for a good overview of how to think about physical memory, virtual memory, and address space (all three are different things). In a nutshell, you have exactly as much physical memory as you have RAM, but your app really has no interaction with that physical memory at all - it's just a convenient place to store the data that in your virtual memory. Your virtual memory is limited by the size of your pagefile, and the amount your app can use is limited by how much other apps are using (although you can allocate more, providing you don't actually use it). Your address space in the 32 bit world is 4GB. Of those, 2 GB are allocated to the kernel (or 1GB if you use the /3BG switch). Of the 2GB that are left, some is going to be used up by your stack, some by the program you are currently running, (and all the dlls, etc..). It's going to get fragmented, and you are only going to be able to get so much contiguous space - this is where your allocation is failing. But since that address space is just a convenient way to access the virtual memory you have allocated for you, it's possible to allocate much more memory, and bring chunks of it into your address space a few at a time.
Raymond Chen has an example of how to allocate 4GB of memory and map part of it into a section of your address space.
Under 32-bit Windows, the maximum allocatable is 16TB and 256TB in 64 bit Windows.
And if you're really into how memory management works in Windows, read this article.
During the ElephantsDream project the Blender Foundation with Blender 3D had similar problems (though on Mac). Can't include the link but google: blender3d memory allocation problem and it will be the first item.
The solution involved File Mapping. Haven't tried it myself but you can read up on it here: http://msdn.microsoft.com/en-us/library/aa366556(VS.85).aspx
With nBandBytes at 466,560,000, you are trying to allocate 1.4 GB. A 32-bit app typically only has access to 2 GB of memory (more if you boot with /3GB and the executable is marked as large address space aware). You may be hard pressed to find that many blocks of contiguous address space for your large chunks of memory.
If you want to allocate gigabytes of memory on a 64-bit OS, use a 64-bit process.
You should be able to allocate a total of about 2GB per process. This article (PDF) explains the details. However, you probably won't be able to get a single, contiguous block that is even close to that large.
Even if you allocate in smaller chunks, you couldn't get the memory you need, especially if the surrounding program has unpredictable memory behavior, or if you need to run on different operating systems. In my experience, the heap space on a 32-bit process caps at around 1.2GB.
At this amount of memory, I would recommend manually writing to disk. Wrap your arrays in a class that manages the memory and writes to temporary files when necessary. Hopefully the characteristics of your program are such that you could effectively cache parts of that data without hitting the disk too much.
Sysinternals VMMap is great for investigating virtual address space fragmentation, which is probably limiting how much contiguous memory you can allocate. I recommend setting it to display free space, then sorting by size to find the largest free areas, then sorting by address to see what is separating the largest free areas (probably rebased DLLs, shared memory regions, or other heaps).
Avoiding extremely large contiguous allocations is probably for the best, as others have suggested.
Setting LARGE_ADDRESS_AWARE=YES (as jalf suggested) is good, as long as the libraries that your application depends on are compatible with it. If you do so, you should test your code with the AllocationPreference registry key set to enable top-down virtual address allocation.