Is a DLL loaded entirely or only some functions? - c++

When a program uses a dynamic shared library, does it load the DLL entirely (so you can almost erase the DLL from disk during application is running) or does it load only a part of the DLL according to its need at each time during the runtime life of the application?

DLL gets loaded entirely. DLLs are same as EXEs in almost all aspect; the only big difference between them is, DLLs are not executable. It doesn't have main() function - the start of a program.

I don't know how the details work in Windows (in Linux I know the responsible code in the kernel quite well), but at least in *nix systems deleting a filesystem entry leaves the file contents intact as long there are file descriptor/handles opened on it.; only after closing the last file descriptor/handle the blocks on the storage device may get overwritten. Windows is POSIX certified, so it follows this behaviour.
DLLs are not loaded into preallocated memory. They're memory mapped. This causes kind of the reverse of swap memory. Instead of swapping RAM to a disk, the contents of the file are mapped into process address space and will end up in RAM through disk/file cache. The same goes for shared objects in *nix operating systems. But there are significant differences between Windows and *nix systems deal with relocations, symbol exports and so on.

It's being loaded entirely, as was pointed out. The special part is not that you can't run the DLL, it's that the memory pages of a DLL are usually shared across process boundaries.
Should a process attempt to write into a page, a copy of that page is taken and the copy is only visible to this process (it's called copy-on-write).
DLLs are PE files (i.e. the same as NT drivers or Win32 programs). They are loaded similarly to .exe files into Memory Mapped Files (MMFs, or "sections" in kernel mode parlance). This means that the DLL file is backing the MMF that represents the loaded DLL. This is the same as when passing a valid file handle (not INVALID_HANDLE_VALUE) to CreateFileMapping and it's also (part of) the reason why you can't delete the DLL while it is in use.
Also, there are some DLLs that contain no code at all. Such a DLL can then also be loaded into a process that was not made for the same processor. E.g. the x86 resource DLL loads fine into an x64 application.

Related

DLL loading and system image space

DLL’s are only ever really loaded once. The dynamic loader will link and redirect calls if your app starts using a specific DLL, something from MS-Office for example.
However, WHEN does the repeated referencing of a DLL for various different users and apps, on the system push a DLL image into system space, so that ALL apps can use it?
Otherwise, does the loaded image remain in the user space?
Bearing in mind: All apps actually look at the SAME 2gb system space, and this is virtualized for them by virtual addressing,
OR, Does the linker always load DLLS into the Kernel space, so that all apps can use them.
DLL’s are only ever really loaded once.
This is not correct. They are mapped into the virtual address space either when the process starts by the loader of the operating system, or when you ask for it through API functions like LoadLibrary. Each process gets a fresh copy and the DLL is initialized each time this happens.
There is no global "system space" which all processes use at once. Each process has their own private virtual address range (which is 4GB with normally 2GB usable memory on 32 bit Windows). If you overwrite parts of a DLL in your own virtual memory, copies of the DLL in other processes are not affected. One process could easily crash the whole system if it weren't like this.

Can I load a library from a memory stream?

Can I load a library from a memory stream? For example my library is encoded a file. I check some conditions and decrypt the file into a memory stream. Now I need to load the decrypted library from that stream into my application and use its functions etc.
In windows, A DLL can only be loaded from a file - as the links suggested, you can create a ramdisk and install that as a drive, but there is no way around the DLL needing to be loading through an file that exists in a filesystem. Part of the reason for this is that the DLL is "demand loaded", that is the system does not load the entire file into memory at once, it loads the parts that are actually being used, 4KB (typically) at a time. It is also not swapped out to the swap area, it is just discarded and re-loaded from the DLL if the system is running short of memory.
Linux works in a very similar way (I know it uses the same kind of demand-loading by default, but not sure if there is a way around it), so I don't believe there is any other way there either, but I haven't looked into it at depth.
Of course, if all you want is a piece of code that you can use in your application, and you want to store that as encrypted/compressed/whatever in your exectuable file, what you can do is allocate some executable memory (in Windows, you can use VirtualAlloc to allocate executable memory). However, you need to ensure that you relocate any absolute memory addresses in your code if you do that, so you will need to store the relocation information in your executable.
Clearly, the easy solution is to unpack your content into a file in the filesystem, and load from there.

Are shared objects/DLLs loaded by different processes into different areas of memory?

I'm trying to figure out how an operating system handles multiple unrelated processes loading the same DLL/shared library. The OSes I'm concerned with are Linux and Windows, but to a lesser extent Mac as well. I presume the answers to my questions will be identical for all operating systems.
I'm particularly interested in regards to explicit linking, but I'd also like to know for implicit linking. I presume the answers for both will also be identical.
This is the best explanation I've found so far concerning Windows:
"The system maintains a per-process reference count on all loaded modules. Calling LoadLibrary increments the reference count. Calling the FreeLibrary or FreeLibraryAndExitThread function decrements the reference count. The system unloads a module when its reference count reaches zero or when the process terminates (regardless of the reference count)." - http://msdn.microsoft.com/en-us/library/windows/desktop/ms684175%28v=vs.85%29.aspx
But it leaves some questions.
1.) Do unrelated processes load the same DLL redundantly (that is, the DLL exists more than once in memory) instead of using reference counting? ( IE, into each process's own "address space" as I think I understand it )
if the DLL is unloaded as soon as a process is terminated, that leads me to believe the other processes using exact same DLL will have a redundantly loaded into memory, otherwise the system should not be allowed to ignore the reference count.
2.) if that is true, then what's the point of reference counting DLLs when you load them multiple times in the same process? What would be the point of loading the same DLL twice into the same process? The only feasible reason I can come up with is that if an EXE references two DLLs, and one of the DLLs references the other, there will be at least two LoadLibrar() and two FreeLibrary() calls for the same library.
I know it seems like I'm answering my own questions here, but I'm just postulating. I'd like to know for sure.
The shared library or DLL will be loaded once for the code part, and multiple times for any writeable data parts [possibly via "copy-on-write", so if you have a large chunk of memory which is mostly read, but some small parts being written, all the DLL's can use the same pieces as long as they haven't been changed from the original value].
It is POSSIBLE that a DLL will be loaded more than once, however. When a DLL is loaded, it is loaded a base-address, which is where the code starts. If we have some process, which is using, say, two DLL's that, because of their previous loading, use the same base-address [because the other processes using this doesn't use both], then one of the DLL's will have to be loaded again at a different base-address. For most DLL's this is rather unusual. But it can happen.
The point of referencecounting every load is that it allows the system to know when it is safe to unload the module (when the referencecount is zero). If we have two distinct parts of the system, both wanting to use the same DLL, and they both load that DLL, you don't really want to cause the system to crash when the first part of the system closes the DLL. But we also don't want the DLL to stay in memory when the second part of the system has closed the DLL, because that would be a waste of memory. [Imagine that this application is a process that runs on a server, and new DLL's are downloaded every week from a server, so each week, the "latest" DLL (which has a different name) is loaded. After a few months, you'd have the entire memory full of this applications "old, unused" DLL's]. There are of course also scenarios such as what you describe, where a DLL loads another DLL using the LoadLibrary call, and the main executable loads the very same DLL. Again, you do need two FreeLibrary calls to close it.

DLL used by a program, where are the variables declared in it stored?

I have a program (not mine, have no source code) which exposes an interface so I can write a DLL which will be called by my program. Now I wondered when I declare some variable in this DLL I make, in what memory space is this going to be stored?
I mean, it's just gonna sit in the memory space of the EXE's address space, right? How is the DLL loaded in regards to the EXE though? I thought a DLL was only ever loaded in memory once, so how does that work in relation to me creating local variables in my DLL? (like objects, classes etc)
A DLL is loaded once per process. Once upon a time DLLs were shared between processes, but that hasn't been the case since Windows 3.1 went the way of the dodo.
Any global variables that you declare in your DLL will be stored in a data page. A different page from the EXE's global variables, mind.
Now, if you allocate memory on the heap, whether or not your allocations are mixed in with the EXEs depend on which heap you use. If both EXE and DLL use the same runtime linked as a DLL then they will both get memory from the same heap. If they have different runtimes, or link against runtime statically, they'll get different heaps. This becomes a very big can of worms, so I shan't go any further here.
Your DLL will declare a DllMain which is the equivalent to the entry point in a regular executable. When your DLL is loaded your DLLMain gets called. Here is a link to the best practices of what should be done in there.
Usually you will do some sort of intialisation there. When your DLL is loaded, it is loaded into the virtual memory space of the executable that called LoadLibrary. LoadLibrary handles all the mapping and relocations that need to be dealt with. From this point all memory you allocate or modify through your DLL is in the same virtual memory space as the process it's mapped into.
Presumably the executable interfaces by loading your DLL then calling some sort of exported function in it. Basically everything that you do once your DLL is loaded will be within the memory space of the process it is loaded into.
If you want to know more about exactly what goes on when your DLL is loaded you should look into the semantics of LoadLibrary().

Dll Memory Management

I have few doubts regarding how windows manages a .dll's memory.
when .dll's are loaded into the host
process, how is the memory managed?
Does .dll get access to the entire
memory available to the host process
or just a portion of it? i.e is
there a limitation when memory is
allocated by a function inside the
.dll?
Will STL classes like string, vector (dynamically
increasing storage) etc used by the
dll, work without issue here?
"Memory management" is a split responsibility, typically. The OS hands address space in big chunks to the runtime, which then hands it out in smaller bits to the program. This address space may or may not have RAM allocated. (If not, there will be swap space to back it)
Basically, when a DLL is loaded, Windows allocates address space for the code and data segements, and calls DllMain(). The C++ compiler will have arranged to call global ctors from DllMain(). If it's DLL written in C++, it will likely depend on a C++ runtime DLL, which in turn will depend on Kernel32.DLL and User32.DLL. Windows understands such dependencies and will arrange for them to be loaded in the correct order.
There is only one address space for a provess, so a DLL will get access to all memory of the process. If a DLL is loaded in two processes, there will be two logical copies of the code and the data. (copies of the code and read-only data might share the same physical RAM though).
If the DLL allocates memory using OS functions, Windows will allocate the memory to the process from which the DLL made that allocation. The process must return the memory, but any code in the process may do so. If your DLL allocates memory using C++ functions, it will do so by calling operator new in the C++ runtime DLL. That memory must be returned by calling operator delete in the (same) C++ runtime DLL. Again, it doesn't matter who does that.
STL classes like vector<> can be multiply instantiated, but it doesn't matter as long as you're using the same compiler. All instantiations will be substantially equal, and all will return the vector's memory to the same deallocation function.
There are 2 main assumptions in this explanation:
The EXE and its DLLs are all compiled with the same compiler
The EXE and its DLLs all link against the C++ runtime DLL (i.e. not statically linked)
Static linking against the C++ runtime is useful if you want to ship an single, self-contained EXE. But if you're already shipping DLLs, you should keep the C++ runtime in its own DLL too.
Does .dll get access to the entire
memory available to the host process
or just a portion of it? i.e is there
a limitation when memory is allocated
by a function inside the .dll?
After a DLL has been loaded into the host process, there is no distinction whatsoever for code "living" in the DLL vs. code "living" in the original executable module. For the process being executed all memory ranges are the same, whether they come from a DLL or from the original executable.
There are no differences as to what the code from the DLL can do vs. what the code compiled in the original exec module can do.
That said, there are differences when using the heap - these are explained in the questions Space_C0wb0y provided the links for in the comments
Will STL classes like string, vector
(dynamically increasing storage) etc
used by the dll, work without issue
here?
They will create issues (solvable ones, but still) if you use them in the interface of your DLL. The will not (or should only under very rare circumstances) create issues if you do not use them on the DLL interface level. I am sure there are a few more specific questions+answers around for this.
Basically, if you use them at the interface level, the DLL and the EXE have to be compiled with "exactly" the same flags, i.e. the types need to be binary compatible. I.e. if the comiler flags (optimization, etc.) in your DLL differ from the ones in the EXE such that a std::string is layed out differently in memory in the EXE vs. the DLL, then passing a string object between the two will result in a crash or silent errors (or demons flying out of your nose).
If you only use the STL types inside of functions or between functions internal to your DLL, then their compatibility with the EXE doesn't matter.