Component Object Model (COM): Where does IMalloc::Alloc allocate memory?

Component Object Model (COM): Where does IMalloc::Alloc allocate memory? - c++

Reading into COM for some legacy project. So far, my understanding is that COM is just a binary specification, and that all implementing components (client and server) must stick to this specification. As long as we are handling COM interfaces with methods receiving and returning only simple value types, everything makes perfect sense to me.
However, there is also the possibility to send pointers to entire objects/variants (containing e.g. a SAFEARRAY) from/to COM objects, and I wonder where the memory of these param objects is allocated. I read that it is memory owned by windows, and that we should not tamper with it except through COM methods.
Then I stumbled upon the IMalloc COM interface with its Alloc method, that seems to allocate a junk of memory in a COM-aware fashion, which perfectly completed the confusion.
In order to not interfere with the heap structure maintained by e.g. C++ (assuming we are writing the COM server in C++), where exactly does IMalloc allocate the memory?

Windows used to create a dedicated heap for COM allocations, CoTaskMemAlloc() allocated it from it directly. That however was dropped at Win8, it now allocates from the default process heap, GetProcessHeap() returns it. The Microsoft CRT at VS2012 was changed as well, used to have its own heap but now also uses the default process heap.
The exact reason these changes were made is murky to me, I never saw a good explanation for it. But is not unlikely to have something to do with WinRT (aka UWP, aka Windows Store, aka Modern UI). Heavily COM-powered under the hood but with a pretty tight language runtime integration provided by the language projection. Or just to bypass the constant trouble these different heaps used to cause. Especially the CRT heap was a DLL Hell nightmare with programs failing miserably when they were rebuilt on a new VS version but still used an old DLL.

My answer to this question is: I don't know and I don't care.
What you do have to do though is to abide by the rules. COM (and COM objects) are free to allocate memory in any way they choose and any assumption you might make about where or how they do it is dangerous and unnecessary. Probably, in the end, it's ultimately allocated via HeapAlloc(), but it needn't be and even if it is you certainly don't know which heap.
Memory allocation by the client (via CoTaskMemAlloc(), say) when calling into a COM object is relatively rare. It's more common for the COM object to allocate whatever memory it needs in order to parcel up the results of the call and then return you a pointer - often in the form of another COM object - that you can use for whatever you need to do next. The API documentation for the method in question will tell you what to do when you are done with that pointer and that's all you ever need to know. This exact mechanism varies by API, for example:
For a COM object call Release(); on that object (this is usually implied, rather than being called out explicitly in the docs).
For a 'raw' pointer, the documentation might tell you to call or CoTaskMemFree() or maybe IMalloc::Free().
For a SAFEARRAY call SafeArrayUnaccessData() / SafeArrayUnlock() / SafeArrayDestroy().
Sometimes you need to call something a bit off-the wall, such as SysFreeString().
Anyway - always - for any particular API, read the docs and you should be OK.

Related

C++ test if two DLLs share the same heap

It is well known that the freeing of heap memory must be done with the same allocator as the one used to allocate it. This is something to take into account when exchanging heap allocated objects across DLL boundaries.
One solution is to provide a destructor for each object, like in a C API: if a DLL allows creating object A it will have to provide a function A_free or something similar 1.
Another related solution is to wrap all allocations into shared_ptr because they store a link to the deallocator 2.
Another solution is to "inject" a top-level allocator into all loaded DLLs (recursively) 3.
Another solution is to just to not exchange heap allocated objects but instead use some kind of protocol 4.
Yet another solution is to be absolutely sure that the DLLs will share the same heap, which should (will?) happen if they share compatible compilations options (compiler, flags, runtime, etc.) 5 6.
This seems quite difficult to guarantee, especially if one would like to use a package manager and not build everything at once.
Is there a way to check at runtime that the heaps are actually the same between multiple DLLs, preferably in a cross-platform way?
For reliability and ease of debugging this seems better than hoping that the application will immediately crash and not corrupt stuff silently.

I think the biggest problem here is your definition of "heap". That assumes there is a unique definition.
The problem is that Windows has HeapAlloc, while C++ typically uses "heap" as the memory allocated by ::operator new. These two can be the same, distinct, a subset, or partially overlapping.
With two DLL's, both might be written in C++ and use ::operator new, but they both could have linked their own unique versions. So there might be multiple answers to the observation in the previous paragraph.
Now let's assume for an example that one DLL has ::operator new forward directly to HeapAlloc, but the other counts allocations before calling HeapAlloc. Clearly the two can't be mixed formally, because the count kept by the second allocator would be wrong. But the code is so simple that both news are probably inlined. So at assembly level you just have calls to HeapAlloc.
There's no way you can detect this at runtime, even if you would disassemble the code on the fly (!) - the inlined counter increment instruction is not distinguishable from the surrounding code.

Why shouldn't I use shared_ptr and unique_ptr always and instead use normal pointers?

I have a background in C# and obj-c so RC/GC are things I (still) hold dear to me. As I started learning C++ in more depth, I can't stop wondering why I would use normal pointers when they are so unmanaged instead of other alternative solutions?
the shared_ptr provides a great way to store references and not lose track of them without deleting them. I can see practical approaches for normal pointers but they seem just bad practices.
Can someone make of case of these alternatives?

Ofcourse you're encouraged to use shared and unique ptr IF they're owning pointers. If you only need an observer however, a raw pointer will do just fine (a pointer bearing no responsibility for whatever it points to).
There is basically no overhead for std::uniqe_ptr and there is some in std::shared_ptr as it does reference counting for you, but rarely will you ever need to save on execution time here.
Also, there is no need for "smart" pointers if you can guarantee lifetime / ownership hierarchy by design; say a parent node in a tree outliving its children - although this is slightly related to the fact whether the pointer actually owns something.

As someone else mentioned, in C++ you have to consider ownership. That being said, the 3D networked multiplayer FPS I'm currently working on has an official rule called "No new or delete." It uses only shared and unique pointers for designating ownership, and raw pointers retrieved from them (using .get()) everywhere that we need to interact with C API's. The performance hit is not noticeable. I use this as an example to illustrate the negligible performance hit since games/simulations typically have the strictest performance requirements.
This has also significantly reduced the amount of time spent debugging and hunting down memory leaks. In theory, a well-designed application would never run into these problems. In real life working with deadlines, legacy systems, or existing game engines that were poorly designed, however, they are an inevitability on large projects like games... unless you use smart pointers. If you must dynamically allocate, don't have ample time for designing/rewriting the architecture or debugging problems related to resource management, and you want to get it off the ground as quickly as possible, smart pointers are the way to go and incur no noticeable performance cost even in large-scale games.

The question is more the opposite: why should you use smart
pointers? Unlike C# (and I think Obj-C), C++ makes extensive
use of value semantics, which means that unless an object has an
application specific lifetime (in which case, none of the smart
pointers apply), you will normally be using value semantics, and
no dynamic allocation. There are exceptions, but if you make it
a point of thinking in terms of value semantics (e.g. like
int), of defining appropriate copy constructors and assignment
operators where necessary, and not allocating dynamically
unless the object has an externally defined lifetime, you'll
find that you rarely need to do anything more; everything just
takes care of itself. Using smart pointers is very much the
exception in most well written C++.

Sometimes you need to interoperate with C APIs, in which case you'll need to use raw pointers
for at least those parts of the code.

In embedded systems, pointers are often used to access registers of hardware chips or memory at specific addresses.
Since the hardware registers already exist, there is no need to dynamically allocate them. There won't be any memory leaks if the pointer is deleted or its value changed.
Similarly with function pointers. Functions are not dynamically allocated and they have "fixed" addresses. Reassigning a function pointer or deleting one will not cause a memory leak.

Whether there are situations when the object needs to be created in memory to a certain address?

Whether there are situations when the object needs to be created in memory to a certain address? Where it can be necessary (example)?
Thank you.

Take a look at placement new: " What uses are there for "placement new"? "
Good examples are writing your own memory allocator, garbage collector, or trying to precisely lay out memory due to cache performance.
It's a niche thing, but sometimes very useful.

You seem to be asking if in a C++ application it's ever necessary to construct an object at a specific address.
Normally, no. But there are exceptions, and the C++ language does support it.
One such exception is when building a kind of cache system for small objects in order to avoid frequent small allocations. One would first construct a large buffer, and then when the client code wants to construct a new small object, the caching system would construct it within this large buffer.

In C++, one may need to construct an object at a specific given address when implementing a pool allocator. For example, Boost Pool: http://www.boost.org/doc/libs/1_47_0/libs/pool/doc/index.html

Mixed-Mode Process vs. Managed-to-Unmanaged IPC

I am trying to come up with design candidates for a current project that I am working on. Its client interface is based on WCF Services exposing public methods and call backs. Requests are routed all the way to C++ libraries (that use boost) that perform calculations, operations, etc.
The current scheme is based on a WCF Service talking to a separate native C++ process via IPC.
To make things a little simpler, there is a recommendation around here to go mixed-mode (i.e. to have a single .NET process which loads the native C++ layer inside it, most likely communicating to it via a very thin C++/CLI layer). The main concern is whether garbage collection or other .NET aspects would hinder the performance of the unmanaged C++ part of the process.
I started looking up concepts of safe points and and GC helper methods (e.g. KeepAlive(), etc.) but I couldn't find any direct discussion about this or benchmarks. From what I understand so far, one of the safe points is if a thread is executing unamanged code and in this case garbage collection does not suspend any threads (is this correct?) to perform the cleanup.
I guess the main question I have is there a performance concern on the native side when running these two types of code in the same process vs. having separate processes.

If you have a thread that has never executed any managed code, it will not be frozen during .NET garbage collection.
If a thread which uses managed code is currently running in native code, the garbage collector won't freeze it, but instead mark the thread to stop when it next reaches managed code. However, if you're thinking of a native dispatch loop that doesn't return for a long time, you may find that you're blocking the garbage collector (or leaving stuff pinned causing slow GC and fragmentation). So I recommend keeping your threads performing significant tasks in native code completely pure.
Making sure that the compiler isn't silently generating MSIL for some standard C++ code (thereby making it execute as managed code) is a bit tricky. But in the end you can accomplish this with careful use of #pragma managed(push, off).

It is very easy to get a mixed mode application up and running, however it can be very hard to get it working well.
I would advise thinking carefully before choosing that design - in particular about how you layer your application and the sort of lifetimes you expect for your unmanaged objects. A few thoughts from past experiences:
C++ object lifetime - by architecture.
Use C++ objects briefly in local scope then dispose of them immediately.
It sounds obvious but worth stating, C++ objects are unmanaged resources that are designed to be used as unmanaged resources. Typically they expect deterministic creation and destruction - often making extensive use of RAII. This can be very awkward to control from a a managed program. The IDispose pattern exists to try and solve this. This can work well for short lived objects but is rather tedious and difficult to get right for long lived objects. In particular if you start making unmanaged objects members of managed classes rather than things that live in function scope only, very quickly every class in your program has to be IDisposable and suddenly managed programming becomes harder than ummanaged programming.
The GC is too aggressive.
Always worth remembering that when we talk about managed objects going out of scope we mean in the eyes of the IL compiler/runtime not the language that you are reading the code in. If an ummanaged object is kept around as a member and a managed object is designed to delete it things can get complicated. If your dispose pattern is not complete from top to bottom of your program the GC can get rather aggressive. Say for example you try to write a managed class which deletes an unmanaged object in its finaliser. Say the last thing you do with the managed object is access the unmanaged pointer to call a method. Then the GC may decide that during that unmanaged call is a great time to collect the managed object. Suddenly your unmanaged pointer is deleted mid method call.
The GC is not aggressive enough.
If you are working within address constraints (e.g. you need a 32 bit version) then you need to remember that the GC holds on to memory unless it thinks it needs to let go. Its only input to these thoughts is the managed world. If the unmanaged allocator needs space there is no connection to the GC. An unmanaged allocation can fail simply because the GC hasn't collected objects that are long out of scope. There is a memory pressure API but again it is only really usable/useful for quite simple designs.
Buffer copying. You also need to think about where to allocate any large memory blocks. Managed blocks can be pinned to look like unmanaged blocks. Unmanaged blocks can only ever be copied if they need to look like managed blocks. However when will that large managed block actually get released?

How are heaps created in mixed language applications?

We have a front end written in Visual Basic 6.0 that calls several back end DLLs written in mixed C/C++. The problem is that each DLL appears to have its own heap and one of them isn’t big enough. The heap collides with the program stack when we’ve allocated enough memory.
Each DLL is written entirely in C, except for the basic DLL wrapper, which is written in C++. Each DLL has a handful of entry points. Each entry point immediately calls a C routine. We would like to increase the size of the heap in the DLL, but haven’t been able to figure out how to do that. I searched for guidance and found these MSDN articles:
http://msdn.microsoft.com/en-us/library/hh405351(v=VS.85).aspx
These articles are interesting but provide conflicting information. In our problem it appears that each DLL has its own heap. This matches the “Heaps: Pleasures and Pains” article that says that the C Run-Time (C RT) library creates its own heap on startup. The “Managing Heap Memory” article says that the C RT library allocated out of the default process heap. The “Memory management options in Win32” article says the behavior depends on the version of the C RT library being used.
We’ve temporarily solved the problem by allocating memory from a private heap. However, in order to improve the structure of this very large complex program, we want to switch from C with a thin C++ wrapper to real C++ with classes. We’re pretty certain that the new and free operator won’t allocate memory from our private heap and we’re wondering how to control the size of the heap C++ uses to allocate objects in each DLL. The application needs to run in all versions of desktop Windows-NT, from 2000 through 7.
The Question
Can anyone point us to definitive and correct documentation that
explains how to control the size of the heap C++ uses to allocate
objects?
Several people have asserted that stack corruption due to heap allocations overwriting the stack are impossible. Here is what we observed. The VB front end uses four DLLs that it dynamicly loads. Each DLL is independant of the others and provides a handful of methods called by the front end. All the DLLs comunicate via data structures written to files on disk. These data structures are all structured staticlly. They contain no pointers, just value types and fixed sized arrays of value types. The problem DLL is invoked by a single call where a file name is passed. It is designed to allocate about 20MB of data structures required to do complete its processing. It does a lot of calculation, writes the results to disk, releases the 20MB of data structures, and returns and error code. The front end then unloads the DLL. While debugging the problem under discussion, we set a break point at the beginning of the data structure allocation code and watched the memory values returned from the calloc calls and compared them with the current stack pointer. We watched as the allocated blocks approached the the stack. After the allocation was complete the stack began to grow until it overlapped the heap. Eventually the calculations wrote into the heap and corrupted the stack. As the stack unwound it tried to return to an invalid address and crashed with a segmentation fault.
Each of our DLLs is staticly linked to the CRT, so that each DLL has its own CRT heap and heap manager. Microsoft says in http://msdn.microsoft.com/en-us/library/ms235460(v=vs.80).aspx:
Each copy of the CRT library has a separate and distinct state.
As such, CRT objects such as file handles, environment variables, and
locales are only valid for the copy of the CRT where these objects are
allocated or set. When a DLL and its users use different copies of the
CRT library, you cannot pass these CRT objects across the DLL boundary
and expect them to be picked up correctly on the other side.
Also, because each copy of the CRT library has its own heap manager,
allocating memory in one CRT library and passing the pointer across a
DLL boundary to be freed by a different copy of the CRT library is a
potential cause for heap corruption.
We don't pass pointers between DLLs. We aren't experiencing heap corruption, we are experiencing stack corruption.

OK, the question is:
Can anyone point us to definitive and correct documentation that
explains how to control the size of the heap C++ uses to allocate
objects?
I am going to answer my own question. I got the answer from reading Raymond Chen's blog The Old New Thing, specifically There's also a large object heap for unmanaged code, but it's inside the regular heap. In that article Raymond recommends Advanced Windows Debugging by Mario Hewardt and Daniel Pravat. This book has very specific information on both stack and heap corruption, which is what I wanted to know. As a plus it provides all sorts of information about how to debug these problems.

Could you please elaborate on this your statement:
The heap collides with the program stack when we’ve allocated enough memory.
If we're talking about Windows (or any other mature platform), this should not be happening: the OS makes sure that stacks, heaps, mapped files and other objects never intersect.
Also:
Can anyone point us to definitive and correct documentation that explains how to control the size of the heap C++ uses to allocate objects?
The heap size is not fixed on Windows: it grows as the application uses more and more memory. It will grow until all available virtual memory space for the process is used. It is pretty easy to confirm this: just write a simple test app which keeps allocating memory and counts how much has been allocated. On a default 32-bit Windows you'll reach almost 2Gb. Surely, initially the heap doesn't occupy all available space, therefore it must grow in the process.
Without many details about the "collision" it's hard to tell what's happening in your case. However, looking at the tags to this question prompts me to one possibility. It is possible (and happens quite often, unfortunately) that ownership of allocated memory areas is being passed between modules (DLLs in your case). Here's the scenario:
there are two DLLs: A and B. Both of them created their own heaps
the DLL A allocates an object in its heap and passes the pointer and ownership to B
the DLL B receives the pointer, uses the memory and deallocates the object
If the heaps are different, most heap managers would not check if the memory region being deallocated actually belongs to it (mostly for performance reasons). So they would deacllocate something which doesn't belong to them. By doing that they corrupt the other module's heap. This may (and often does) lead to a crash. But not always. Depending on your luck (and particular heap manager implementation), this operation may change one of the heaps in a manner that the next allocation will happen outside of the area where the heap is located.
This often happens when one module is managed code, while the other is native one. Since you have the VB6 tag in the question, I'd check if this is the case.

If the stack grows large enough to hit the heap, a prematurely aborted stack overflow may be the problem: Invalid data is passed that does not satisfy the exit condition of some recursion (loop detection is not working or not existing) in the problem DLL so that an infinite recursion consumes ridiculously large stack space. One would expect such a DLL to terminate with a stack overflow exception, but for maybe compiler / linker optimizations or large foreign heap sizes it crashes elsewhere.

Heaps are created by the CRT. That is to say, the malloc heap is created by the CRT, and is unrelated to HeapCreate(). It's not used for large allocations, though, which are handed off to the OS directly.
With multiple DLLs, you might have multiple heaps (newer VC versions are better at sharing, but even VC6 had no problem if you used MSVCRT.DLL - that's shared)
The stack, on the other hand, is managed by the OS. Here you see why multiple heaps don't matter: The OS allocation for the different heaps will never collide with the OS allocation for the stack.
Mind you, the OS may allocate heap space close to the stack. The rule is just no overlap, after all, there's no guaranteed "unused separation zone". If you then have a buffer overflow, it could very well overflow into the stack space.
So, any solutions? Yes: move to VC2010. It has buffer security checks, implemented in quite an efficient way. They're the default even in release mode.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js