Dll Memory Management

Dll Memory Management - c++

I have few doubts regarding how windows manages a .dll's memory.
when .dll's are loaded into the host
process, how is the memory managed?
Does .dll get access to the entire
memory available to the host process
or just a portion of it? i.e is
there a limitation when memory is
allocated by a function inside the
.dll?
Will STL classes like string, vector (dynamically
increasing storage) etc used by the
dll, work without issue here?

"Memory management" is a split responsibility, typically. The OS hands address space in big chunks to the runtime, which then hands it out in smaller bits to the program. This address space may or may not have RAM allocated. (If not, there will be swap space to back it)
Basically, when a DLL is loaded, Windows allocates address space for the code and data segements, and calls DllMain(). The C++ compiler will have arranged to call global ctors from DllMain(). If it's DLL written in C++, it will likely depend on a C++ runtime DLL, which in turn will depend on Kernel32.DLL and User32.DLL. Windows understands such dependencies and will arrange for them to be loaded in the correct order.
There is only one address space for a provess, so a DLL will get access to all memory of the process. If a DLL is loaded in two processes, there will be two logical copies of the code and the data. (copies of the code and read-only data might share the same physical RAM though).
If the DLL allocates memory using OS functions, Windows will allocate the memory to the process from which the DLL made that allocation. The process must return the memory, but any code in the process may do so. If your DLL allocates memory using C++ functions, it will do so by calling operator new in the C++ runtime DLL. That memory must be returned by calling operator delete in the (same) C++ runtime DLL. Again, it doesn't matter who does that.
STL classes like vector<> can be multiply instantiated, but it doesn't matter as long as you're using the same compiler. All instantiations will be substantially equal, and all will return the vector's memory to the same deallocation function.
There are 2 main assumptions in this explanation:
The EXE and its DLLs are all compiled with the same compiler
The EXE and its DLLs all link against the C++ runtime DLL (i.e. not statically linked)
Static linking against the C++ runtime is useful if you want to ship an single, self-contained EXE. But if you're already shipping DLLs, you should keep the C++ runtime in its own DLL too.

Does .dll get access to the entire
memory available to the host process
or just a portion of it? i.e is there
a limitation when memory is allocated
by a function inside the .dll?
After a DLL has been loaded into the host process, there is no distinction whatsoever for code "living" in the DLL vs. code "living" in the original executable module. For the process being executed all memory ranges are the same, whether they come from a DLL or from the original executable.
There are no differences as to what the code from the DLL can do vs. what the code compiled in the original exec module can do.
That said, there are differences when using the heap - these are explained in the questions Space_C0wb0y provided the links for in the comments
Will STL classes like string, vector
(dynamically increasing storage) etc
used by the dll, work without issue
here?
They will create issues (solvable ones, but still) if you use them in the interface of your DLL. The will not (or should only under very rare circumstances) create issues if you do not use them on the DLL interface level. I am sure there are a few more specific questions+answers around for this.
Basically, if you use them at the interface level, the DLL and the EXE have to be compiled with "exactly" the same flags, i.e. the types need to be binary compatible. I.e. if the comiler flags (optimization, etc.) in your DLL differ from the ones in the EXE such that a std::string is layed out differently in memory in the EXE vs. the DLL, then passing a string object between the two will result in a crash or silent errors (or demons flying out of your nose).
If you only use the STL types inside of functions or between functions internal to your DLL, then their compatibility with the EXE doesn't matter.

Related

How can you track memory across DLL boundaries

I want performant run-time memory metrics so I wrote a memory tracker based on overloading new & delete. It basically lets walk your allocations in the heap and analyze everything about them - fragmentation, size, time, number, callstack, etc. But, it has 2 fatal flaws: It can't track memory allocated in other DLLs and when ownership of objects is passed to DLLs or vice versa crashes ensue. And some smaller flaws: If a user uses malloc instead of new it's untracked; or if a user makes a class defined new/delete.
How can I eliminate these flaws? I think I must be going about this fundamentally incorrectly by overloading new/delete, is there a better way?

The right way to implement this is to use detours and a separate tool that runs in its own process. The procedure is roughly the following:
Create memory allocation in a remote process.
Place there code of a small loader that will load your dll.
Call CreateRemoteThread API that will run your loader.
From inside of the loaded dll establish detours (hooks, interceptors) on the alloc/dealloc functions.
Process the calls, track activity.
If you implement your tool this way, it will be not important from what DLL or directly from exe the memory allocation routines are called. Plus you can track activities from any process, not necessarily that you compiled yourself.
MS Windows allows checking contents of the virtual address space of the remote process. You can summarize use of virtual address space that was collected this way in a histogram, like the following:
From this picture you can see how many virtual allocation of what size are existing in your target process.
The picture above shows an overview of the virtual address space usage in 32-bit MSVC DevEnv. Blue stripe means a commited piece of emory, magenta stripe - reserved. Green is unoccupied part of the address space.
You can see that lower addresses are pretty fragmented, while the middle area - not. Blue lines at high addresses - various dlls that are loaded into the process.

You should find out the common memory management routines that are called by new/delete and malloc/free, and intercept those. It is usually malloc/free in the end, but check to make sure.
On UNIX, I would use LD_PRELOAD with some library that re-implemented those routines. On Windows, you have to hack a little bit, but this link seems to give a good description of the process. It basically suggests that you use Detours from Microsoft Research.

Passing ownership of objects between modules is fundamentally flawed. It showed up with your custom allocator, but there are plenty of other cases that will fail also:
compiler upgrades, and recompiling only some DLLs
mixing compilers from different vendors
statically linking the runtime library
Just to name a few. Free every object from the same module that allocated it (often by exporting a deletion function, such as IUnknown::Release()).

DLL / SO library, how does library memory relate to the calling process's?

I was reading that all a process's memory is released by the OS when the process terminates (by any means) so negating the need to call every dtor in turn.
Now my question is how does the memory of a DLL or SO relate to clean up of alloc'd memory?
I ask because I will probably end up using a Java and/or C# to call into a C++ DLL with some static C style functions which will allocate the C++ objects on the heap. Sorry if I got carried away with the heap vs stack thread, I feel I have lost sight of the concept of _the_ heap (ie only one).
Any other potential pitfalls for memory leaks when using libraries?

The library becomes part of the process when it is loaded. Regarding tidy up of memory, handles, resources etc., the system doesn't distinguish whether they were created in the executable image or the library.

There is nothing for you to worry about. The operating system's loader takes care of this.
In general, shared libraries will be made visible to your process's address space via memory mapping (all done by the loader), and the OS keeps track of how many processes still need a given shared library. State data that is needed separately per process is typically handled by copy-on-write, so there's no danger that your crypto library might accidentally be using another process's key :-) In short, don't worry.
Edit. Perhaps you're wondering what happens if your library function calls malloc() and doesn't clean up. Well, the library's code becomes part of your process, so it is really your process that requests the memory, and so when your process terminates, the OS cleans up as usual.

DLL used by a program, where are the variables declared in it stored?

I have a program (not mine, have no source code) which exposes an interface so I can write a DLL which will be called by my program. Now I wondered when I declare some variable in this DLL I make, in what memory space is this going to be stored?
I mean, it's just gonna sit in the memory space of the EXE's address space, right? How is the DLL loaded in regards to the EXE though? I thought a DLL was only ever loaded in memory once, so how does that work in relation to me creating local variables in my DLL? (like objects, classes etc)

A DLL is loaded once per process. Once upon a time DLLs were shared between processes, but that hasn't been the case since Windows 3.1 went the way of the dodo.
Any global variables that you declare in your DLL will be stored in a data page. A different page from the EXE's global variables, mind.
Now, if you allocate memory on the heap, whether or not your allocations are mixed in with the EXEs depend on which heap you use. If both EXE and DLL use the same runtime linked as a DLL then they will both get memory from the same heap. If they have different runtimes, or link against runtime statically, they'll get different heaps. This becomes a very big can of worms, so I shan't go any further here.

Your DLL will declare a DllMain which is the equivalent to the entry point in a regular executable. When your DLL is loaded your DLLMain gets called. Here is a link to the best practices of what should be done in there.
Usually you will do some sort of intialisation there. When your DLL is loaded, it is loaded into the virtual memory space of the executable that called LoadLibrary. LoadLibrary handles all the mapping and relocations that need to be dealt with. From this point all memory you allocate or modify through your DLL is in the same virtual memory space as the process it's mapped into.
Presumably the executable interfaces by loading your DLL then calling some sort of exported function in it. Basically everything that you do once your DLL is loaded will be within the memory space of the process it is loaded into.
If you want to know more about exactly what goes on when your DLL is loaded you should look into the semantics of LoadLibrary().

Overhead of DLL

I have a quite basic question.
When a library is used only by a single process. Should I keep it as a static library?
If I use the library as a DLL, but only a single process uses it. **What will be the overhead?*

There is almost no overhead to having a separate DLL. Basically, the first call to a function exported from a DLL will run a tiny stub that fixes up the function addresses so that subsequent calls are performed via a single jump through a jump table. The way CPUs work, this extra indirection is practically free.
The main "overhead" is actually an opportunity cost, not an "overhead" per-se. That is, modern compilers can do something called "whole program optimization" in which the entire module (.exe or .dll) is compiled and optimized at once, at link time. This means the compiler can do things like adjust calling conventions, inline functions and so on across all .cpp files in the whole program, rather than just within a single .cpp file.
This can result in fairly nice performance boost, for certain kinds of applications. But of course, whole program optimization cannot happen across DLL boundaries.

There are two overheads to a DLL. First as the DLL is loaded into memory the internal addresses must be fixed for the actual address that the DLL is loaded at, versus the addresses assumed by the linker. This can be minimized by re-basing the DLLs. The second overhead is when the program and DLL are loaded, as the program calls into the DLL have the addresses of the functions filled in. These overheads are generally negligible except for very large programs and DLLs.
If this is a real concern you can use delay-loaded DLLs which only get loaded as they are called. If the DLL is never used, for example it implements a very uncommon function, then it never gets loaded at all. The downside is that there's a short delay the first time the DLL is called.
I like to use statically linked libraries, not to decrease the overhead but to minimize the hassle of having to keep the DLL with the program.

Imported functions have no more overhead than virtual functions.

How do I decide whether having more that one VC++ CRT state is a problem for my application?

This MSDN article says that if my application loads VC++ runtime multiple times because either it or some DLLs it depends on are statically linked against VC++ runtime then the application will have multiple CRT states and this can lead to undefined behaviour.
How exactly do I decide if this is a problem for me? For example in this MSDN article several examples are provided that basically say that objects maintained by C++ runtime such as file handles should ot be passed across DLL boundaries. What exactly is a list of things to check if I want my project to statically link against VC++ runtime?

It's OK to have multiple copies of the CRT as long as you aren't doing certain things...:
Each copy of the CRT will manage its own heap.
This can cause unexpected problems if you allocate a heap-based object in module A using 'new', then pass it to module B where you attempt to release it using 'delete'. The CRT for module B will attempt to identify the pointer in terms of its own heap, and that's where the undefined behavior come in: If you're lucky, the CRT in module B will detect the problem and you'll get a heap corruption error. If you're unlucky, something weird will happen, and you won't notice it until much later...
You'll also have serious problems if you pass other CRT-managed things like file handles between modules.
If all of your modules dynamically link to the CRT, then they'll all share the same heap, and you can pass things around and not have to worry about where they get deleted.
If you statically link the CRT into each module, then you have to take care that your interfaces don't include these types, and that you don't allocate using 'new' or malloc in one module, and then attempt to clean up in another.
You can get around this limitation by using an OS allocation function like MS Windows's GlobalAlloc()/GlobalFree(), but this can be a pain since you have to keep track of which allocation scheme you used for each pointer.
Having to worry about whether the target machine has the CRT DLL your modules require, or packaging one to go along with your application can be a pain, but it's a one-time pain - and once done, you won't need to worry about all that other stuff above.

It is not that you can't pass CRT handles around. It is that you should be careful when you have two modules reading/writing the handle.
For example, in dll A, you write
char* p = new char[10];
and in the main program, write
delete[]p;
When dll A and your main program have two CRTs, the new an delete operations will happen in different heaps. The heap in the DLL can't manage the heap in the main program, resulting memory leaks.
Same thing on File handle. Internally File handle may have different implementations in addition to its memory resource. Calling fopen in one module and calling fwrite and fclose in another one could result problems as the data FILE* points could be different in CRT runtimes.
But you can certainly pass FILE* or memory pointer back and forth between two modules.

One issue I've seen with having different versions of the MSVC runtime loaded is that each runtime has its own heap, so you can end up with allocations failing with "out of memory" even though there is plenty of memory available, but in the wrong heap.

The major problem with this is that the problems can occur at runtime. At the binary level, you've got DLLs making function calls, eventually to the OS. The calls themselves are fine, but at runtime the arguments are not. Of course, this can depend on the exact code paths exercised.
Worse, it could even be time- or machine-dependent. An example of that would be a shared resource, used by two DLLs and protected by a proper reference count. The last user will delete it, which could be the DLL that created it (lucky) or the other (trouble)

As far as I know, there isn't a (safe) way to check which version of the CRT a library is statically linked with.
You can, however, check the binaries with a program such as DependencyWalker. If you don't see any DLLs beginning with MSVC in the dependency list, it's most likely statically linked.
If the library is from a third part, I'd just assume it's using a different version of the CRT.
This isn't 100% accurate, of course, because the library you're looking at could be linked with another library that links with the CRT.

if CRT handles are a problem, then don't expose CRT handles in your dll exports :-)
don't pass memory pointers around (if an exported class allocates some memory, it should free it... which means don't mess with inline constructor/destructor?)

This MSDN article (here) says that you should be alterted to the problem by linker error LNK4098. It also suggests that passing any CRT handles across a CRT boundary is likely to cause trouble, and mentions locale, low-level file I/O and memory allocation as examples.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js