Debugging COM reference counters - c++

In a project, I'm talking to an object in a .EXE server (the object performs expensive queries for me which should be cached), and I seem to have gotten my reference counts wrong, which makes the server process free an object it still holds a reference to, making the host process fail in curious and interesting ways that involve losing data and sending a bug report to the vendor.
Is there a way I can ask COM to raise some condition that is detectable in a debugger if a proxy object whose refcount has dropped to zero is used in some way?

It's not likely that this is possible using raw interfaces - the reference count is maintained by the COM server and how it's impelmented is up to the server - the implementation is inside the server code, so unless you have the source and can debug the server, you have no way of getting to it.
However, it is likely your problem is cause by manually calling AddRef and Release. If that is the case, you can use a RAII/smart pointer solution. ATL provides one, but if for whatever reason you can't use that, it's easy enough to create your own. You can then not only create or use provided debugging facilities to keep track of reference counting, you will be much less likely to get it wrong in the first place.

Related

Component Object Model (COM): Where does IMalloc::Alloc allocate memory?

Reading into COM for some legacy project. So far, my understanding is that COM is just a binary specification, and that all implementing components (client and server) must stick to this specification. As long as we are handling COM interfaces with methods receiving and returning only simple value types, everything makes perfect sense to me.
However, there is also the possibility to send pointers to entire objects/variants (containing e.g. a SAFEARRAY) from/to COM objects, and I wonder where the memory of these param objects is allocated. I read that it is memory owned by windows, and that we should not tamper with it except through COM methods.
Then I stumbled upon the IMalloc COM interface with its Alloc method, that seems to allocate a junk of memory in a COM-aware fashion, which perfectly completed the confusion.
In order to not interfere with the heap structure maintained by e.g. C++ (assuming we are writing the COM server in C++), where exactly does IMalloc allocate the memory?
Windows used to create a dedicated heap for COM allocations, CoTaskMemAlloc() allocated it from it directly. That however was dropped at Win8, it now allocates from the default process heap, GetProcessHeap() returns it. The Microsoft CRT at VS2012 was changed as well, used to have its own heap but now also uses the default process heap.
The exact reason these changes were made is murky to me, I never saw a good explanation for it. But is not unlikely to have something to do with WinRT (aka UWP, aka Windows Store, aka Modern UI). Heavily COM-powered under the hood but with a pretty tight language runtime integration provided by the language projection. Or just to bypass the constant trouble these different heaps used to cause. Especially the CRT heap was a DLL Hell nightmare with programs failing miserably when they were rebuilt on a new VS version but still used an old DLL.
My answer to this question is: I don't know and I don't care.
What you do have to do though is to abide by the rules. COM (and COM objects) are free to allocate memory in any way they choose and any assumption you might make about where or how they do it is dangerous and unnecessary. Probably, in the end, it's ultimately allocated via HeapAlloc(), but it needn't be and even if it is you certainly don't know which heap.
Memory allocation by the client (via CoTaskMemAlloc(), say) when calling into a COM object is relatively rare. It's more common for the COM object to allocate whatever memory it needs in order to parcel up the results of the call and then return you a pointer - often in the form of another COM object - that you can use for whatever you need to do next. The API documentation for the method in question will tell you what to do when you are done with that pointer and that's all you ever need to know. This exact mechanism varies by API, for example:
For a COM object call Release(); on that object (this is usually implied, rather than being called out explicitly in the docs).
For a 'raw' pointer, the documentation might tell you to call or CoTaskMemFree() or maybe IMalloc::Free().
For a SAFEARRAY call SafeArrayUnaccessData() / SafeArrayUnlock() / SafeArrayDestroy().
Sometimes you need to call something a bit off-the wall, such as SysFreeString().
Anyway - always - for any particular API, read the docs and you should be OK.

Mixed-Mode Process vs. Managed-to-Unmanaged IPC

I am trying to come up with design candidates for a current project that I am working on. Its client interface is based on WCF Services exposing public methods and call backs. Requests are routed all the way to C++ libraries (that use boost) that perform calculations, operations, etc.
The current scheme is based on a WCF Service talking to a separate native C++ process via IPC.
To make things a little simpler, there is a recommendation around here to go mixed-mode (i.e. to have a single .NET process which loads the native C++ layer inside it, most likely communicating to it via a very thin C++/CLI layer). The main concern is whether garbage collection or other .NET aspects would hinder the performance of the unmanaged C++ part of the process.
I started looking up concepts of safe points and and GC helper methods (e.g. KeepAlive(), etc.) but I couldn't find any direct discussion about this or benchmarks. From what I understand so far, one of the safe points is if a thread is executing unamanged code and in this case garbage collection does not suspend any threads (is this correct?) to perform the cleanup.
I guess the main question I have is there a performance concern on the native side when running these two types of code in the same process vs. having separate processes.
If you have a thread that has never executed any managed code, it will not be frozen during .NET garbage collection.
If a thread which uses managed code is currently running in native code, the garbage collector won't freeze it, but instead mark the thread to stop when it next reaches managed code. However, if you're thinking of a native dispatch loop that doesn't return for a long time, you may find that you're blocking the garbage collector (or leaving stuff pinned causing slow GC and fragmentation). So I recommend keeping your threads performing significant tasks in native code completely pure.
Making sure that the compiler isn't silently generating MSIL for some standard C++ code (thereby making it execute as managed code) is a bit tricky. But in the end you can accomplish this with careful use of #pragma managed(push, off).
It is very easy to get a mixed mode application up and running, however it can be very hard to get it working well.
I would advise thinking carefully before choosing that design - in particular about how you layer your application and the sort of lifetimes you expect for your unmanaged objects. A few thoughts from past experiences:
C++ object lifetime - by architecture.
Use C++ objects briefly in local scope then dispose of them immediately.
It sounds obvious but worth stating, C++ objects are unmanaged resources that are designed to be used as unmanaged resources. Typically they expect deterministic creation and destruction - often making extensive use of RAII. This can be very awkward to control from a a managed program. The IDispose pattern exists to try and solve this. This can work well for short lived objects but is rather tedious and difficult to get right for long lived objects. In particular if you start making unmanaged objects members of managed classes rather than things that live in function scope only, very quickly every class in your program has to be IDisposable and suddenly managed programming becomes harder than ummanaged programming.
The GC is too aggressive.
Always worth remembering that when we talk about managed objects going out of scope we mean in the eyes of the IL compiler/runtime not the language that you are reading the code in. If an ummanaged object is kept around as a member and a managed object is designed to delete it things can get complicated. If your dispose pattern is not complete from top to bottom of your program the GC can get rather aggressive. Say for example you try to write a managed class which deletes an unmanaged object in its finaliser. Say the last thing you do with the managed object is access the unmanaged pointer to call a method. Then the GC may decide that during that unmanaged call is a great time to collect the managed object. Suddenly your unmanaged pointer is deleted mid method call.
The GC is not aggressive enough.
If you are working within address constraints (e.g. you need a 32 bit version) then you need to remember that the GC holds on to memory unless it thinks it needs to let go. Its only input to these thoughts is the managed world. If the unmanaged allocator needs space there is no connection to the GC. An unmanaged allocation can fail simply because the GC hasn't collected objects that are long out of scope. There is a memory pressure API but again it is only really usable/useful for quite simple designs.
Buffer copying. You also need to think about where to allocate any large memory blocks. Managed blocks can be pinned to look like unmanaged blocks. Unmanaged blocks can only ever be copied if they need to look like managed blocks. However when will that large managed block actually get released?

Is it safe to send a pointer to a static function over the network?

I was thinking about some RPC code that I have to implement in C++ and I wondered if it's safe (and under which assumptions) to send it over the network to the same binary code (assuming it's exactly the same and that they are running on same architecture). I guess virtual memory should do the difference here.
I'm asking it just out of curiosity, since it's a bad design in any case, but I would like to know if it's theoretically possible (and if it's extendable to other kind of pointers to static data other than functions that the program may include).
In general, it's not safe for many reasons, but there are limited cases in which it will work. First of all, I'm going to assume you're using some sort of signing or encryption in the protocol that ensures the integrity of your data stream; if not, you have serious security issues already that are only compounded by passing around function pointers.
If the exact same program binary is running on both ends of the connection, if the function is in the main program (or in code linked from a static library) and not in a shared library, and if the program is not built as a position-independent executable (PIE), then the function pointer will be the same on both ends and passing it across the network should work. Note that these are very stringent conditions that would have to be documented as part of using your program, and they're very fragile; for instance if somebody upgrades the software on one side and forgets to upgrade the version on the other side of the connection at the same time, things will break horribly and dangerously.
I would avoid this type of low-level RPC entirely in favor of a higher-level command structure or abstract RPC framework, but if you really want to do it, a slightly safer approach would be to pass function names and use dlsym or equivalent to look them up. If the symbols reside in the main program binary rather than libraries, then depending on your platform you might need -rdynamic (GCC) or a similar option to make them available to dlsym. libffi might also be a useful tool for abstracting this.
Also, if you want to avoid depending on dlsym or libffi, you could keep your own "symbol table" hard-coded in the binary as a static const linear table or hash table mapping symbol names to function pointers. The hash table format used in ELF for this purpose is very simple to understand and implement, so I might consider basing your implementation on that.
What is it a pointer to?
Is it a pointer to a piece of static program memory? If so, don't forget that it's an address, not an offset, so you'd first need to convert between the two accordingly.
Second, if it's not a piece of static memory (ie: statically allocated array created at build time as opposed to run time) it's not really possible at all.
Finally, how are you ensuring the two pieces of code are the same? Are both binaries bit identical (eg: diff -a binary1 binary2). Even if they are bit-identical, depending on the virtual memory management on each machine, the entire program's program memory segment may not exist in a single page, or the alignment across multiple pages may be different for each system.
This is really a bad idea, no matter how you slice it. This is what message passing and APIs are for.
I don't know of any form of RPC that will let you send a pointer over the network (at least without doing something like casting to int first). If you do convert to int on the sending end, and convert that back to a pointer on the far end, you get pretty much the same as converting any other arbitrary int to a pointer: undefined behavior if you ever attempt to dereference it.
Normally, if you pass a pointer to an RPC function, it'll be marshalled -- i.e., the data it points to will be packaged up, sent across, put into memory, and a pointer to that local copy of the data passed to the function on the other end. That's part of why/how IDL gets a bit ugly -- you need to tell it how to figure out how much data to send across the wire when/if you pass a pointer. Most know about zero-terminated strings. For other types of arrays, you typically need to specify the size of the data (somehow or other).
This is highly system dependent. On systems with virtual addressing such that each process thinks it's running at the same address each time it executes, this could plausibly work for executable code. Darren Kopp's comment and link regarding ASLR is interesting - a quick read of the Wikipedia article suggests the Linux & Windows versions focus on data rather than executable code, except for "network facing daemons" on Linux, and on Windows it applies only when "specifically linked to be ASLR-enabled".
Still, "same binary code" is best assured by static linking - if different shared objects/libraries are loaded, or they're loaded in different order (perhaps due to dynamic loading - dlopen - driven by different ordering in config files or command line args etc.) you're probably stuffed.
Sending a pointer over the network is generally unsafe. The two main reasons are:
Reliability: the data/function pointer may not point to the same entity (data structure or function) on another machine due to different location of the program or its libraries or dynamically allocated objects in memory. Relocatable code + ASLR can break your design. At the very least, if you want to point to a statically allocated object or a function you should sent its offset w.r.t. the image base if your platform is Windows or do something similar on whatever OS you are.
Security: if your network is open and there's a hacker (or they have broken into your network), they can impersonate your first machine and make the second machine either hang or crash, causing a denial of service, or execute arbitrary code and get access to sensitive information or tamper with it or hijack the machine and turn it into an evil bot sending spam or attacking other computers. Of course, there are measures and countermeasures here, but...
If I were you, I'd design something different. And I'd ensure that the transmitted data is either unimportant or encrypted and the receiving part does the necessary validation of it prior to using it, so there are no buffer overflows or execution of arbitrary things.
If you're looking for some formal guarantees, I cannot help you. You would have to look in the documentation of the compiler and OS that you're using - however I doubt that you would find the necessary guarantees - except possibly for some specialized embedded systems OS'.
I can however provide you with one scenario where I'm 99.99% sure that it will work without any problems:
Windows
32 bit process
Function is located in a module that doesn't have relocation information
The module in question is already loaded & initialized on the client side
The module in question is 100% identical on both sides
A compiler that doesn't do very crazy stuff (e.g. MSVC and GCC should both be fine)
If you want to call a function in a DLL you might run into problems. As per the list above the module (=DLL) may not have relocation information, which of course makes it impossible to relocate it (which is what we need). Unfortunately that also means that loading the DLL will fail, if the "preferred load address" is used by something else. So that would be kind-of risky.
If the function resides in the EXE however, you should be fine. A 32 bit EXE doesn't need relocation information, and most don't include it (MSVC default settings). BTW: ASLR is not an issue here since a) ASLR does only move modules that are tagged as wanting to be moved and b) ASLR could not move a 32 bit windows module without relocation information, even if it wanted to.
Most of the above just makes sure that the function will have the same address on both sides. The only remaining question - at least that I can think of - is: is it safe to call a function via a pointer that we initialized by memcpy-ing over some bytes that we received from the network, assuming that the byte-pattern is the same that we would have gotten if we had taken the address of the desired function? That surely is something that the C++ standard doesn't guarantee, but I don't expect any real-world problems from current real-world compilers.
That being said, I would not recommend to do that, except for situations where security and robustness really aren't important.

Why implement DB connection pointer object as a reference counting pointer? (C++)

At our company one of the core C++ classes (Database connection pointer) is implemented as a reference counting pointer. To be clear, the objects are NOT DB connections themselves, but pointers to a DB connection object.
The library is very old, and nobody who designed is around anymore.
So far, nether I, nor any C++ experts in the company that I asked have come up with a good reason for why this particular design was chosen. Any ideas?
It is introducing some problems (partially due to awful reference pointer implementation used), and I'm trying to understand if this design actually has some deep underlying reasons?
The usage pattern these days seems to be that the DB connection pointer object is returned by a DB connection manager class, and it's somewhat unclear whether DB connection pointers were designed to be able to be used independently of DB connection manager.
Probably it's a mistake. Without looking at the code it's impossible to know for sure, but the quality of the reference-counted pointer implementation is suggestive. Poor design, especially around resource management, is not unheard of in the C++ community</bitter sarcasm>.
With that said, reference-counted pointers are useful when you have objects of indeterminate lifetime which are very expensive to create, or whose state needs to be shared among several users. Depending on the underlying architecture, database connections could fit this definition: if each database connection needs to authenticate over the global internet, say, it could easily be worth your trouble to save a single connection and reuse it, rather than making new connections and disposing of them as you go.
But if I've understood you correctly, you don't have a single database connection object with a collection of refcounted pointers pointing to it. Rather, you have a database connection object, a collection of ordinary pointers to it, and a collection of refcounted pointers to those pointers. This is insanity, and almost certainly the result of confused thinking by the original developers. The alternative is that it was an act of deliberate evil, e.g. to ensure job security. If so they must have failed, as none of the perpetrators still work for your company.

The Price of DuplicateHandle

I'm writing a class library that provides convenient object-oriented frontends to the C API that is the Windows Registry. I'm curious, however, what the best course of action is for handling HREGs, for instances where my key class is copied.
I can either
Allocate a heap integer and use it as a reference count. Call RegCloseKey() on the handle and deallocate the integer when the refrence count equals zero.
Use the builtin functionality of handles, and rather than maintaining a reference count, call DuplicateHandle() on the HREG when the registry key object is copied. Then always call RegCloseKey in destructors.
The DuplicateHandle() design is much simpler, but I'm concerned if designing things that way is going to severely hamper performance of the application. Because my application recurses through hundreds of thousands of keys, speed of copying this object is a sensitive issue.
What are the inherent overheads of the DuplicateHandle() function?
I suspect you'll find that DuplicateHandle has very little overhead. The kernel already manages a reference count for each open object, and DuplicateHandle adds a new entry to the kernel handle table for the destination process, and increments the object reference count. (DuplicateHandle also normally does security checks, although it may skip that if the source and destination processes are the same.)
You may run into difficulties if you open hundreds of thousands of objects at the same time, depending on how many handles Windows feels like letting you open.
I've never encountered any implication that DuplicateHandle() has unexpected overhead.
I suspect a few science experiments are in order to confirm that. Be sure to do that on several platforms, as that is the kind of thing Microsoft might alter without warning.