Properties of dynamic link libraries in Windows (dll) - c++

As per Microsoft (see first point in this), a DLL can have only one instance of itself running in a system at one time, but from what I read at other places online including here on SO, processes can load multiple instances of the same dll and read only data in dll may be shared using memory mapping techniques but each process has its own copy of write data from dll stored in its own memory space.
Also, in the second point at the same link, a DLL can't have its own stack, memory handles, global memory, etc, but from what I understand, since there can be multiple functions exported and/or inside a dll, these must have their own stacks, file handles, etc. And why can't a global variable defined in a DLL be considered as using global memory?
I'm working in C++.

the context of a DLL used in singular won't make much sense. To get better understanding, use DLL's in conjunction with the context of being loaded in a process.
The documentation is correct. Threads that are part of code/exported functions within DLL will have their stack. Processes have Memory handles, global memory..etc not individual threads.
If you have a global variable defined in DLL, its global in the context of the process that it's mapped to.
If a DLL is mapped to multiple processes, then each process gets it's own global variable.
It's part of maintaining process isolation/integrity (each process has it's own memory area, handle tables..etc)
HTH

Related

Are shared objects/DLLs loaded by different processes into different areas of memory?

I'm trying to figure out how an operating system handles multiple unrelated processes loading the same DLL/shared library. The OSes I'm concerned with are Linux and Windows, but to a lesser extent Mac as well. I presume the answers to my questions will be identical for all operating systems.
I'm particularly interested in regards to explicit linking, but I'd also like to know for implicit linking. I presume the answers for both will also be identical.
This is the best explanation I've found so far concerning Windows:
"The system maintains a per-process reference count on all loaded modules. Calling LoadLibrary increments the reference count. Calling the FreeLibrary or FreeLibraryAndExitThread function decrements the reference count. The system unloads a module when its reference count reaches zero or when the process terminates (regardless of the reference count)." - http://msdn.microsoft.com/en-us/library/windows/desktop/ms684175%28v=vs.85%29.aspx
But it leaves some questions.
1.) Do unrelated processes load the same DLL redundantly (that is, the DLL exists more than once in memory) instead of using reference counting? ( IE, into each process's own "address space" as I think I understand it )
if the DLL is unloaded as soon as a process is terminated, that leads me to believe the other processes using exact same DLL will have a redundantly loaded into memory, otherwise the system should not be allowed to ignore the reference count.
2.) if that is true, then what's the point of reference counting DLLs when you load them multiple times in the same process? What would be the point of loading the same DLL twice into the same process? The only feasible reason I can come up with is that if an EXE references two DLLs, and one of the DLLs references the other, there will be at least two LoadLibrar() and two FreeLibrary() calls for the same library.
I know it seems like I'm answering my own questions here, but I'm just postulating. I'd like to know for sure.
The shared library or DLL will be loaded once for the code part, and multiple times for any writeable data parts [possibly via "copy-on-write", so if you have a large chunk of memory which is mostly read, but some small parts being written, all the DLL's can use the same pieces as long as they haven't been changed from the original value].
It is POSSIBLE that a DLL will be loaded more than once, however. When a DLL is loaded, it is loaded a base-address, which is where the code starts. If we have some process, which is using, say, two DLL's that, because of their previous loading, use the same base-address [because the other processes using this doesn't use both], then one of the DLL's will have to be loaded again at a different base-address. For most DLL's this is rather unusual. But it can happen.
The point of referencecounting every load is that it allows the system to know when it is safe to unload the module (when the referencecount is zero). If we have two distinct parts of the system, both wanting to use the same DLL, and they both load that DLL, you don't really want to cause the system to crash when the first part of the system closes the DLL. But we also don't want the DLL to stay in memory when the second part of the system has closed the DLL, because that would be a waste of memory. [Imagine that this application is a process that runs on a server, and new DLL's are downloaded every week from a server, so each week, the "latest" DLL (which has a different name) is loaded. After a few months, you'd have the entire memory full of this applications "old, unused" DLL's]. There are of course also scenarios such as what you describe, where a DLL loads another DLL using the LoadLibrary call, and the main executable loads the very same DLL. Again, you do need two FreeLibrary calls to close it.

Are memory modules mapped into process' virtual space?

I see that on Windows the function EnumProcessModules returns a number of modules loaded for a specified process (some of these should be system dlls like guard32.dll, version.dll, etc..)
My question is: are these modules mapped into the process' virtual space? Can I jump to an instruction located into one of these modules (of course knowing the address) from the main app code?
Yes, the DLL's should be mapped into the process virtual address space. The mapping may not be backed by a real physical page if the code in that page has not been executed, and of course executing "random" bits of code without the right initialization or setup for the code to execute properly (e.g calling the processing function that uses some data that needs to be allocated in another function) will clearly end badly in some defintion of bad. Also bear in mind that the DLL may well be loaded at different addresses at different times you run the same code, etc, so you can't rely on the address of the DLL being constant - and it may well be completely different in another machine.
Yes, just call GetProcAddress using the module which you got from EnumProcessModules. GetProcAddress calculates the function offset within the module.
Yes, any DLL code that can be invoked directly from your own executable must be mapped into your process space. You can get a precise chart of your process virtual memory space using SysInternal's VMMap utility: http://technet.microsoft.com/en-us/sysinternals/dd535533
As mentioned in other answers, the virtual address space is largely, if not entirely, dynamic.
There are cases where certain shared libraries are not directly accessible from your process. These are typically sandboxed (secured) kernel or driver libraries, which are invoked through a special secure layer/API that performs parameter validation and then executes a ring/context switch into a different virtual process address space, or passes the command on via a secured inter-thread communication queue. These are expensive operations so they are typically reserved for use only when there are benefits to system stability.

shared library address space

While I was studying about shared library I read a statement
Although the code of a shared library is shared among multiple
processes, its variables are not. Each process that uses the library
has its own copies of the global and static variables that are defined
within the library.
I just have few doubts.
Whether code part of each process are in separate address space?
Whether shared-library code part are in some some global(unique) address space.
I am just a starter so please help me understand.
Thanks!
Shared libraries are loaded into a process by memory-mapping the file into some portion of the process's address-space. When multiple processes load the same library, the OS simply lets them share the same physical RAM.
Portions of the library that can be modified, such as static globals, are generally loaded in copy-on-write mode, so that when a write is attempted, a page fault occurs, the kernel responds by copying the affected page to another physical page of RAM (for that process only), the mapping redirected to the new page, and then finally the write operation completes.
To answer your specific points:
All processes have their own address space. The sharing of physical memory between processes is invisible to each process (unless they do so deliberately via a shared memory API).
All data and code live in physical RAM, which is a kind of address-space. Most of the addresses you are likely see, however, are virtual memory addresses belonging to the address-space of one process or another, even if that "process" is the kernel.

DLL used by a program, where are the variables declared in it stored?

I have a program (not mine, have no source code) which exposes an interface so I can write a DLL which will be called by my program. Now I wondered when I declare some variable in this DLL I make, in what memory space is this going to be stored?
I mean, it's just gonna sit in the memory space of the EXE's address space, right? How is the DLL loaded in regards to the EXE though? I thought a DLL was only ever loaded in memory once, so how does that work in relation to me creating local variables in my DLL? (like objects, classes etc)
A DLL is loaded once per process. Once upon a time DLLs were shared between processes, but that hasn't been the case since Windows 3.1 went the way of the dodo.
Any global variables that you declare in your DLL will be stored in a data page. A different page from the EXE's global variables, mind.
Now, if you allocate memory on the heap, whether or not your allocations are mixed in with the EXEs depend on which heap you use. If both EXE and DLL use the same runtime linked as a DLL then they will both get memory from the same heap. If they have different runtimes, or link against runtime statically, they'll get different heaps. This becomes a very big can of worms, so I shan't go any further here.
Your DLL will declare a DllMain which is the equivalent to the entry point in a regular executable. When your DLL is loaded your DLLMain gets called. Here is a link to the best practices of what should be done in there.
Usually you will do some sort of intialisation there. When your DLL is loaded, it is loaded into the virtual memory space of the executable that called LoadLibrary. LoadLibrary handles all the mapping and relocations that need to be dealt with. From this point all memory you allocate or modify through your DLL is in the same virtual memory space as the process it's mapped into.
Presumably the executable interfaces by loading your DLL then calling some sort of exported function in it. Basically everything that you do once your DLL is loaded will be within the memory space of the process it is loaded into.
If you want to know more about exactly what goes on when your DLL is loaded you should look into the semantics of LoadLibrary().

Dll Memory Management

I have few doubts regarding how windows manages a .dll's memory.
when .dll's are loaded into the host
process, how is the memory managed?
Does .dll get access to the entire
memory available to the host process
or just a portion of it? i.e is
there a limitation when memory is
allocated by a function inside the
.dll?
Will STL classes like string, vector (dynamically
increasing storage) etc used by the
dll, work without issue here?
"Memory management" is a split responsibility, typically. The OS hands address space in big chunks to the runtime, which then hands it out in smaller bits to the program. This address space may or may not have RAM allocated. (If not, there will be swap space to back it)
Basically, when a DLL is loaded, Windows allocates address space for the code and data segements, and calls DllMain(). The C++ compiler will have arranged to call global ctors from DllMain(). If it's DLL written in C++, it will likely depend on a C++ runtime DLL, which in turn will depend on Kernel32.DLL and User32.DLL. Windows understands such dependencies and will arrange for them to be loaded in the correct order.
There is only one address space for a provess, so a DLL will get access to all memory of the process. If a DLL is loaded in two processes, there will be two logical copies of the code and the data. (copies of the code and read-only data might share the same physical RAM though).
If the DLL allocates memory using OS functions, Windows will allocate the memory to the process from which the DLL made that allocation. The process must return the memory, but any code in the process may do so. If your DLL allocates memory using C++ functions, it will do so by calling operator new in the C++ runtime DLL. That memory must be returned by calling operator delete in the (same) C++ runtime DLL. Again, it doesn't matter who does that.
STL classes like vector<> can be multiply instantiated, but it doesn't matter as long as you're using the same compiler. All instantiations will be substantially equal, and all will return the vector's memory to the same deallocation function.
There are 2 main assumptions in this explanation:
The EXE and its DLLs are all compiled with the same compiler
The EXE and its DLLs all link against the C++ runtime DLL (i.e. not statically linked)
Static linking against the C++ runtime is useful if you want to ship an single, self-contained EXE. But if you're already shipping DLLs, you should keep the C++ runtime in its own DLL too.
Does .dll get access to the entire
memory available to the host process
or just a portion of it? i.e is there
a limitation when memory is allocated
by a function inside the .dll?
After a DLL has been loaded into the host process, there is no distinction whatsoever for code "living" in the DLL vs. code "living" in the original executable module. For the process being executed all memory ranges are the same, whether they come from a DLL or from the original executable.
There are no differences as to what the code from the DLL can do vs. what the code compiled in the original exec module can do.
That said, there are differences when using the heap - these are explained in the questions Space_C0wb0y provided the links for in the comments
Will STL classes like string, vector
(dynamically increasing storage) etc
used by the dll, work without issue
here?
They will create issues (solvable ones, but still) if you use them in the interface of your DLL. The will not (or should only under very rare circumstances) create issues if you do not use them on the DLL interface level. I am sure there are a few more specific questions+answers around for this.
Basically, if you use them at the interface level, the DLL and the EXE have to be compiled with "exactly" the same flags, i.e. the types need to be binary compatible. I.e. if the comiler flags (optimization, etc.) in your DLL differ from the ones in the EXE such that a std::string is layed out differently in memory in the EXE vs. the DLL, then passing a string object between the two will result in a crash or silent errors (or demons flying out of your nose).
If you only use the STL types inside of functions or between functions internal to your DLL, then their compatibility with the EXE doesn't matter.