How are DLLs mapped into current programs virtual address space - c++

When I load a DLL in program, how does that occur in memory? Does it get loaded into my Virtual Address Space? If it does, where are the text and data segments stored? I have a 32-bit program I'm maintaining, which uses a large part of the available heap for image processing routines, and I want to know how much I should worry about loading DLLs which themselves might use a lot of space.

Yes: everything that your process needs to access must be in its adress space. This applies to your code and to your data as well.
Here you'll find more about the anatomy of process memory and adress space
and here it's explained that dll are loaded into the virtual adress space.
Remark: the dll might be shared between several processes: it is then loaded only once in memory by the OS. But every process using it could potentially see it at a different place in its own virtual adress space (see also this SO answer about relative virtual adresses).

Related

DLL loading and system image space

DLL’s are only ever really loaded once. The dynamic loader will link and redirect calls if your app starts using a specific DLL, something from MS-Office for example.
However, WHEN does the repeated referencing of a DLL for various different users and apps, on the system push a DLL image into system space, so that ALL apps can use it?
Otherwise, does the loaded image remain in the user space?
Bearing in mind: All apps actually look at the SAME 2gb system space, and this is virtualized for them by virtual addressing,
OR, Does the linker always load DLLS into the Kernel space, so that all apps can use them.
DLL’s are only ever really loaded once.
This is not correct. They are mapped into the virtual address space either when the process starts by the loader of the operating system, or when you ask for it through API functions like LoadLibrary. Each process gets a fresh copy and the DLL is initialized each time this happens.
There is no global "system space" which all processes use at once. Each process has their own private virtual address range (which is 4GB with normally 2GB usable memory on 32 bit Windows). If you overwrite parts of a DLL in your own virtual memory, copies of the DLL in other processes are not affected. One process could easily crash the whole system if it weren't like this.

How to avoid thrid party lib(no source codes) to allocate memory from physical memory?

Suppose this scenario: I refer to a third party lib in my C++ app, but I don't want the third party lib to use my physical memory at all. Instead, I want it to only allocate memory from hard disk. I don't know source codes of third party lib, however as it run in the Windows platform, so I think it's possible to control the memory management with Win32 API.
My problem is how to avoid thrid party lib to allocate memory from physical memory.
Am I going in the wrong direction? Anybody can help me?
PS: I'm using Visual C++ 2010.
For a regular C++ program or library there's no such thing as "allocating physical memory" or "allocating memory from hard disk" in Windows. All "normal" allocation requests are served by virtual memory. It is up to the operating system to decide which virtual memory region will reside in physical RAM and which will reside on disc at any given moment. Neither your program, not the third party library has any control over this.
In other words, the "problem" you seem to describe does not really exist. In a properly designed OS based on virtual memory, the physical RAM is always fully occupied. Unoccupied RAM is wasted RAM - this is the governing principle behind this. That means that the concept of "saving physical RAM" does not really exist in such OS: the physical RAM is always 100% occupied anyway.
In order to make data stored in virtual memory the OS first has to make sure that data is loaded into physical RAM. For this reason, any library that uses memory will have its data loaded inyto physical RAM, regardless of whether you want it or not. Otherwise that third party library simply won't be able to function at all.

Are memory modules mapped into process' virtual space?

I see that on Windows the function EnumProcessModules returns a number of modules loaded for a specified process (some of these should be system dlls like guard32.dll, version.dll, etc..)
My question is: are these modules mapped into the process' virtual space? Can I jump to an instruction located into one of these modules (of course knowing the address) from the main app code?
Yes, the DLL's should be mapped into the process virtual address space. The mapping may not be backed by a real physical page if the code in that page has not been executed, and of course executing "random" bits of code without the right initialization or setup for the code to execute properly (e.g calling the processing function that uses some data that needs to be allocated in another function) will clearly end badly in some defintion of bad. Also bear in mind that the DLL may well be loaded at different addresses at different times you run the same code, etc, so you can't rely on the address of the DLL being constant - and it may well be completely different in another machine.
Yes, just call GetProcAddress using the module which you got from EnumProcessModules. GetProcAddress calculates the function offset within the module.
Yes, any DLL code that can be invoked directly from your own executable must be mapped into your process space. You can get a precise chart of your process virtual memory space using SysInternal's VMMap utility: http://technet.microsoft.com/en-us/sysinternals/dd535533
As mentioned in other answers, the virtual address space is largely, if not entirely, dynamic.
There are cases where certain shared libraries are not directly accessible from your process. These are typically sandboxed (secured) kernel or driver libraries, which are invoked through a special secure layer/API that performs parameter validation and then executes a ring/context switch into a different virtual process address space, or passes the command on via a secured inter-thread communication queue. These are expensive operations so they are typically reserved for use only when there are benefits to system stability.

DLL size in memory & size on the hard disk

Is there a relationship between DLL size in memory and size on the hard disk?
This is because I am using Task Manager extension (MS), and I can go to an EXE in the list and right click -> Module, then I can see all the DLLs this EXE is using. It has a Length column, but is it in bytes? And the value (Length) of the DLL seems to be different from the (DLL) size on the hard disk. Why?
There's a relationship, but it's not entirely direct or straightforward.
When your DLL is first used, it gets mapped to memory. That doesn't load it into memory, just allocates some address space in your process where it can/could be loaded when/if needed. Then, individual pages of the DLL get loaded into memory via demand paging -- i.e., when you refer to some of the address space that got allocated, the code (or data) that's mapped to that/those address(es) will be loaded if it's not already in memory.
Now, the address mapping does take up a little space (one 4K page for each megabyte of address space that gets mapped). Of course, when you load some data into memory, that uses up memory too.
Note, however, that most pages can/will be shared between processes too, so if your DLL was used by 5 different processes at once, it would be mapped 5 times (i.e., once to each process that used it) but there would still only be one physical copy in memory (at least normally).
Between those, it can be a little difficult to even pin down exactly what you mean by the memory consumption of a particular DLL.
There are two parts that come into play in determining the size of a dll in memory:
As everyone else pointed out, dll's get memory mapped, this leads to thier size being page aligned (on of the reasons preferred load addresses from back in the day had to be page aligned). generally, page alignment is 4Kb for 32bit systems, 8Kb for 64 bit systems (for a more indepth look at this on windows, see this).
Dll's contain a segment for uninitialized data, on disk this segment is compressed, generally to a base + size, when the dll is loaded and initialized, the space for the .bss segment gets allocated, increasing its size. Generally this a small and will be absored by the page alignment, but if a dll contains huge static buffers, this can balloon its virtualized size.
The memory footprint will usually be bigger than on disk size because when it is mapped into memory it is page aligned. Standard page sizes are 4KB and 8KB so if your dll is 1KB of code its still going to use 4KB in memory.
Don't think of a .dll or a .exe as something that gets copied into memory to be executed.
Think of it as a set of instructions for the loader.
Sure it contains the program and static data text.
More importantly, it contains all the information allowing that text to be relocated, and to have all its unsatisfied references hooked up, and to export references that other modules may need.
Then if there's symbol and line number information for debugging, that's still more text.
So in general you would expect it to be larger than the memory image.
It all depends on what you call "memory", and what exactly does your TaskManager extension show.
Every executable module (Exe/Dll) is mapped into an address space. The size of this mapping equals to its size. And, I guess, this is what your "extension" shows to you.

shared library address space

While I was studying about shared library I read a statement
Although the code of a shared library is shared among multiple
processes, its variables are not. Each process that uses the library
has its own copies of the global and static variables that are defined
within the library.
I just have few doubts.
Whether code part of each process are in separate address space?
Whether shared-library code part are in some some global(unique) address space.
I am just a starter so please help me understand.
Thanks!
Shared libraries are loaded into a process by memory-mapping the file into some portion of the process's address-space. When multiple processes load the same library, the OS simply lets them share the same physical RAM.
Portions of the library that can be modified, such as static globals, are generally loaded in copy-on-write mode, so that when a write is attempted, a page fault occurs, the kernel responds by copying the affected page to another physical page of RAM (for that process only), the mapping redirected to the new page, and then finally the write operation completes.
To answer your specific points:
All processes have their own address space. The sharing of physical memory between processes is invisible to each process (unless they do so deliberately via a shared memory API).
All data and code live in physical RAM, which is a kind of address-space. Most of the addresses you are likely see, however, are virtual memory addresses belonging to the address-space of one process or another, even if that "process" is the kernel.