The Price of DuplicateHandle - c++

I'm writing a class library that provides convenient object-oriented frontends to the C API that is the Windows Registry. I'm curious, however, what the best course of action is for handling HREGs, for instances where my key class is copied.
I can either
Allocate a heap integer and use it as a reference count. Call RegCloseKey() on the handle and deallocate the integer when the refrence count equals zero.
Use the builtin functionality of handles, and rather than maintaining a reference count, call DuplicateHandle() on the HREG when the registry key object is copied. Then always call RegCloseKey in destructors.
The DuplicateHandle() design is much simpler, but I'm concerned if designing things that way is going to severely hamper performance of the application. Because my application recurses through hundreds of thousands of keys, speed of copying this object is a sensitive issue.
What are the inherent overheads of the DuplicateHandle() function?

I suspect you'll find that DuplicateHandle has very little overhead. The kernel already manages a reference count for each open object, and DuplicateHandle adds a new entry to the kernel handle table for the destination process, and increments the object reference count. (DuplicateHandle also normally does security checks, although it may skip that if the source and destination processes are the same.)
You may run into difficulties if you open hundreds of thousands of objects at the same time, depending on how many handles Windows feels like letting you open.

I've never encountered any implication that DuplicateHandle() has unexpected overhead.
I suspect a few science experiments are in order to confirm that. Be sure to do that on several platforms, as that is the kind of thing Microsoft might alter without warning.

Related

Mixed-Mode Process vs. Managed-to-Unmanaged IPC

I am trying to come up with design candidates for a current project that I am working on. Its client interface is based on WCF Services exposing public methods and call backs. Requests are routed all the way to C++ libraries (that use boost) that perform calculations, operations, etc.
The current scheme is based on a WCF Service talking to a separate native C++ process via IPC.
To make things a little simpler, there is a recommendation around here to go mixed-mode (i.e. to have a single .NET process which loads the native C++ layer inside it, most likely communicating to it via a very thin C++/CLI layer). The main concern is whether garbage collection or other .NET aspects would hinder the performance of the unmanaged C++ part of the process.
I started looking up concepts of safe points and and GC helper methods (e.g. KeepAlive(), etc.) but I couldn't find any direct discussion about this or benchmarks. From what I understand so far, one of the safe points is if a thread is executing unamanged code and in this case garbage collection does not suspend any threads (is this correct?) to perform the cleanup.
I guess the main question I have is there a performance concern on the native side when running these two types of code in the same process vs. having separate processes.
If you have a thread that has never executed any managed code, it will not be frozen during .NET garbage collection.
If a thread which uses managed code is currently running in native code, the garbage collector won't freeze it, but instead mark the thread to stop when it next reaches managed code. However, if you're thinking of a native dispatch loop that doesn't return for a long time, you may find that you're blocking the garbage collector (or leaving stuff pinned causing slow GC and fragmentation). So I recommend keeping your threads performing significant tasks in native code completely pure.
Making sure that the compiler isn't silently generating MSIL for some standard C++ code (thereby making it execute as managed code) is a bit tricky. But in the end you can accomplish this with careful use of #pragma managed(push, off).
It is very easy to get a mixed mode application up and running, however it can be very hard to get it working well.
I would advise thinking carefully before choosing that design - in particular about how you layer your application and the sort of lifetimes you expect for your unmanaged objects. A few thoughts from past experiences:
C++ object lifetime - by architecture.
Use C++ objects briefly in local scope then dispose of them immediately.
It sounds obvious but worth stating, C++ objects are unmanaged resources that are designed to be used as unmanaged resources. Typically they expect deterministic creation and destruction - often making extensive use of RAII. This can be very awkward to control from a a managed program. The IDispose pattern exists to try and solve this. This can work well for short lived objects but is rather tedious and difficult to get right for long lived objects. In particular if you start making unmanaged objects members of managed classes rather than things that live in function scope only, very quickly every class in your program has to be IDisposable and suddenly managed programming becomes harder than ummanaged programming.
The GC is too aggressive.
Always worth remembering that when we talk about managed objects going out of scope we mean in the eyes of the IL compiler/runtime not the language that you are reading the code in. If an ummanaged object is kept around as a member and a managed object is designed to delete it things can get complicated. If your dispose pattern is not complete from top to bottom of your program the GC can get rather aggressive. Say for example you try to write a managed class which deletes an unmanaged object in its finaliser. Say the last thing you do with the managed object is access the unmanaged pointer to call a method. Then the GC may decide that during that unmanaged call is a great time to collect the managed object. Suddenly your unmanaged pointer is deleted mid method call.
The GC is not aggressive enough.
If you are working within address constraints (e.g. you need a 32 bit version) then you need to remember that the GC holds on to memory unless it thinks it needs to let go. Its only input to these thoughts is the managed world. If the unmanaged allocator needs space there is no connection to the GC. An unmanaged allocation can fail simply because the GC hasn't collected objects that are long out of scope. There is a memory pressure API but again it is only really usable/useful for quite simple designs.
Buffer copying. You also need to think about where to allocate any large memory blocks. Managed blocks can be pinned to look like unmanaged blocks. Unmanaged blocks can only ever be copied if they need to look like managed blocks. However when will that large managed block actually get released?

Debugging COM reference counters

In a project, I'm talking to an object in a .EXE server (the object performs expensive queries for me which should be cached), and I seem to have gotten my reference counts wrong, which makes the server process free an object it still holds a reference to, making the host process fail in curious and interesting ways that involve losing data and sending a bug report to the vendor.
Is there a way I can ask COM to raise some condition that is detectable in a debugger if a proxy object whose refcount has dropped to zero is used in some way?
It's not likely that this is possible using raw interfaces - the reference count is maintained by the COM server and how it's impelmented is up to the server - the implementation is inside the server code, so unless you have the source and can debug the server, you have no way of getting to it.
However, it is likely your problem is cause by manually calling AddRef and Release. If that is the case, you can use a RAII/smart pointer solution. ATL provides one, but if for whatever reason you can't use that, it's easy enough to create your own. You can then not only create or use provided debugging facilities to keep track of reference counting, you will be much less likely to get it wrong in the first place.

Problems with Singleton Pattern

I've been reading about Singleton pattern for last few days. The general perception is that the scenarios where it is required are quite few (if not rare) probably because it has its own set of problems such as
In a garbage collection environment it can be an issue with regards to memory management.
In a multithreaded environment it can cause bottlenecks and introduce synchronization problems.
Headache from testing prespective.
I'm starting to get the ideas behind these issues but not totally sure about these concerns.
Like in case of garbage collection issue, usage of static in singleton implementation (which is inherent to the pattern), is that the concern? Since it would mean that the static instance will last till the application. Is it something that degrades memory management (it just means that the memory allocated to the singleton pattern won't be freed)?
Ofcourse in a multithreaded setup, having all the threads being in contention for the singleton instance would be a bottleneck. But how does usage of this pattern causes synchronization problems (surely we can use a mutex or something like that to synchronize access).
From a (unit?)testing perspective, since singletons use static methods (which are difficult to be mocked or stubbed) they can cause problems. Not sure about this. Can someone please elaborate on this testing concern?
Thanks.
In a garbage collection environment it can be an issue with regards to memory management
In typical singleton implementations, once you create the singleton you can never destroy it. This non-destructive nature is sometimes acceptable when the singleton is small. However, if the singleton is massive, then you are unnecessarily using more memory than you should.
This is a bigger issue in languages where you have a garbage collector (like Java, Python, etc) because the garbage collector will always believe that the singleton is necessary. In C++, you can cheat by delete-ing the pointer. However, this opens its own can of worms because it's supposed to be a singleton, but by deleting it, you are making it possible to create a second one.
In most cases, this over-use of memory does not degrade memory performance, but it can be considered the same as a memory leak. With a large singleton, you are wasting memory on your user's computer or device. (You can run into memory fragmentation if you allocate a huge singleton, but this is usually a non-concern).
In a multithreaded environment it can cause bottlenecks and introduce synchronization problems.
If every thread is accessing the same object and you are using a mutex, each thread must wait until another has unlocked the singleton. And if the threads depend greatly upon the singleton, then you will degrade performance to a single-thread environment because a thread spends most of its life waiting.
However, if your application domain allows it, you can create one object for each thread -- this way the thread does not spend time waiting and instead does the work.
Headache from testing prespective.
Notably, a singleton's constructor can only be tested once. You have to create an entirely new test suite in order to test the constructor again. This is fine if your constructor doesn't take any parameters, but once you accept a paremeter you can no longer effective unit teest.
Further, you can't stub out the singleton as effectively and your use of mock objects becomes difficult to use (there are ways around this, but it's more trouble than it's worth). Keep reading for more on this...
(And it leads to a bad design, too!)
Singletons are also a sign of a poor design. Some programmers want to make their database class a singleton. "Our application will never use two databases," they typically think. But, there will come a time when it may make sense to use two databases, or unit testing you will want to use two different SQLite databases. If you used a singleton, you will have to make some serious changes to your application. But if you used regular objects from the start, you can take advantage of OOP to get your task done efficiently and on-time.
Most cases of singleton's are the result of the programmer being lazy. They do not wish to pass around an object (eg, database object) to a bunch of methods, so they create a singleton that each method uses as an implicit parameter. But, this approach bites for the reasons above.
Try to never use a singleton, if you can. Although they may seem like a good approach from the start, it usually always leads to poor design and hard to maintain code down the line.
If you haven't seen the article Singletons are Pathological Liars, you should read that too. It discusses how the interconnections between singletons are hidden from the interface, so the way you need to construct software is also hidden from the interface.
There are links to a couple of other articles on singletons by the same author.
When evaluating the Singleton pattern, you have to ask "What's the alternative? Would the same problems happen if I didn't use the Singleton pattern?"
Most systems have some need for Big Global Objects. These are items which are large and expensive (eg Database Connection Managers), or hold pervasive state information (for example, locking information).
The alternative to a Singleton is to have this Big Global Object created on startup, and passed as a parameter to all of the classes or methods that need access to this object.
Would the same problems happen in the non-singleton case? Let's examine them one by one:
Memory Management: The Big Global Object would exist when the application was started, and the object will exist until shutdown. As there is only one object, it will take up exactly the same amount of memory as the singleton case. Memory usage is not an issue. (#MadKeithV: Order of destruction at shutdown is a different issue).
Multithreading and bottlenecks: All of the threads would need to access the same object, whether they were passed this object as a parameter or whether they called MyBigGlobalObject.GetInstance(). So Singleton or not, you would still have the same synchronisation issues, (which fortunately have standard solutions). This isn't an issue either.
Unit testing: If you aren't using the Singleton pattern, then you can create the Big Global Object at the start of each test, and the garbage collector will take it away when the test completes. Each test will start with a new, clean environment that's unnaffected by the previous test. Alternatively, in the Singleton case, the one object lives through ALL of the tests, and can easily become "contaminated". So yes, the Singleton pattern really bites when it comes to unit testing.
My preference: because of the unit testing issue alone, I tend to avoid the Singleton pattern. If it's one of the few environments where I don't have unit testing (for example, the user interface layer) then I might use Singletons, otherwise I avoid them.
My main argument against singletons is basically that they combine two bad properties.
The things you mention can be a problem, sure, but they don't have to be. The synchronization thing can be fixed, it only becomes a bottleneck if many threads frequently access the singleton, and so on. Those issues are annoying, but not deal-breakers.
The much more fundamental problem with singletons is that what they're trying to do is fundamentally bad.
A singleton, as defined by the GoF, has two properties:
It is globally accessible, and
It prevents the class from ever being instantiated more than once.
The first one should be simple. Globals are, generally speaking, bad. If you don't want a global, then you don't want a singleton either.
The second issue is less obvious, but fundamentally, it attempts to solve a nonexistent problem.
When was the last time you accidentally instantiated a class, where you instead intended to reuse an existing instance?
When was the last time you accidentally typed "std::ostream() << "hello world << std::endl", when you meant "std::cout << "hello world << std::endl"?
It just doesn't happen. So we don't need to prevent this in the first place.
But more importantly, the gut feeling that "only one instance must exist" is almost always wrong.
What we usually mean is "I can currently only see a use for one instance".
but "I can only see a use for one instance" is not the same as "the application will come crashing down if anyone dares to create two instances".
In the latter case, a singleton might be justified. but in the former, it's really a premature design choice.
Usually, we do end up wanting more than one instance.
You often end up needing more than one logger. There's the log you write clean, structured messages to, for the client to monitor, and there's the one you dump debug data to for your own use.
It's also easy to imagine that you might end up using more than one database.
Or program settings. Sure, only one set of settings can be active at a time. But while they're active, the user might enter the "options" dialog and configure a second set of settings. He hasn't applied them yet, but once he hits 'ok', they have to be swapped in and replace the currently active set. And that means that until he's hit 'ok', two sets of options actually exist.
And more generally, unit testing:
One of the fundamental rules of unit tests is that they should be run in isolation. Each test should set up the environment from scratch, run the test, and tear everything down. Which means that each test will be wanting to create a new singleton object, run the test against it, and close it.
Which obviously isn't possible, because a singleton is created once, and only once. It can't be deleted. New instances can't be created.
So ultimately, the problem with singletons isn't technicalities like "it's hard to get thread safety correct", but a much more fundamental "they don't actually contribute anything positive to your code. They add two traits, each of them negative, to your codebase. Who would ever want that?"
About this unit testing concern. The main problems seems to be not with testing the singletons themselves, but with testing the objects that use them.
Such objects cannot be isolated for testing, since they have dependencies on singletons which are both hidden and hard to remove. It gets even worse if the singleton represents an interface to an external system (DB connection, payment processor, ICBM firing unit). Testing such an object might unexpectedly write into DB, send some money who knows where or even fire some intercontinental missiles.
I agree with the earlier sentiment that frequently they're used so that you don't have to pass an argument all over the place. I do it. The typical example is your system logging object. I typically would make that a singleton so I don't have to pass it all over the system.
Survey - In the example of the logging object, how many of you (show of hands) would add an extra arg to any routine that might need to log something -vs- use a singleton?
I wouldn't necessarily equate Singletons with Globals. Nothing should stop a developer from passing an instance of the object, singleton or otherwise, as a parameter, rather than conjure it out of the air. The intent of hiding its global accessibility could even be done by hiding its getInstance function to a few select friends.
As far as the unit testing flaw, Unit means small, so re-invoking the app to test the singleton a different way seems reasonable, unless I'm missing something the point.

Options for a message passing system for a game

I'm working on an RTS game in C++ targeted at handheld hardware (Pandora). For reference, the Pandora has a single ARM processor at ~600Mhz and runs Linux. We're trying to settle on a good message passing system (both internal and external), and this is new territory for me.
It may help to give an example of a message we'd like to pass. A unit may make this call to load its models into memory:
sendMessage("model-loader", "load-model", my_model.path, model_id );
In return, the unit could expect some kind of message containing a model object for the particular model_id, which can then be passed to the graphics system. Please note that this sendMessage function is in no way final. It just reflects my current understanding of message passing systems, which is probably not correct :)
From what I can tell there are two pretty distinct choices. One is to pass messages in memory, and only pass through the network when you need to talk to an external machine. I like this idea because the overhead seems low, but the big problem here is it seems like you need to make extensive use of mutex locking on your message queues. I'd really like to avoid excess locking if possible. I've read a few ways to implement simple queues without locking (by relying on atomic int operations) but these assume there is only one reader and one writer for a queue. This doesn't seem useful to our particular case, as an object's queue will have many writers and one reader.
The other choice is to go completely over the network layer. This has some fun advantages like getting asynchronous message passing pretty much for free. Also, we gain the ability to pass messages to other machines using the exact same calls as passing locally. However, this solution rubs me the wrong way, probably because I don't fully understand it :) Would we need a socket for every object that is going to be sending/receiving messages? If so, this seems excessive. A given game will have thousands of objects. For a somewhat underpowered device like the Pandora, I fear that abusing the network like that may end up being our bottleneck. But, I haven't run any tests yet, so this is just speculation.
MPI seems to be popular for message passing but it sure feels like overkill for what we want. This code is never going to touch a cluster or need to do heavy calculation.
Any insight into what options we have for accomplishing this is much appreciated.
The network will be using locking as well. It will just be where you cannot see it, in the OS kernel.
What I would do is create your own message queue object that you can rewrite as you need to. Start simple and make it better as needed. That way you can make it use any implementation you like behind the scenes without changing the rest of your code.
Look at several possible implementations that you might like to do in the future and design your API so that you can handle them all efficiently if you decide to implement in those terms.
If you want really efficient message passing look at some of the open source L4 microkernels. Those guys put a lot of time into fast message passing.
Since this is a small platform, it might be worth timing both approaches.
However, barring some kind of big speed issue, I'd always go for the approach that is simpler to code. That is probably going to be using the network stack, as it will be the same code no matter where the recipient is, and you won't have to manually code and degug your mutual exclusions, message buffering, allocations, etc.
If you find out it is too slow, you can always recode the local stuff using memory later. But why waste the time doing that up front if you might not have to?
I agree with Zan's recommendation to pass messages in memory whenever possible.
One reason is that you can pass complex objects C++ without needing to marshal and unmarshal (serialize and de-serialize) them.
The cost of protecting your message queue with a semaphore is most likely going to be less than the cost of making networking code calls.
If you protect your message queue with some lock-free algorithm (using atomic operations as you alluded to yourself) you can avoid a lot a context switches into and out of the kernel.

Is shared ownership of objects a sign of bad design?

Background: When reading Dr. Stroustrup's papers and FAQs, I notice some strong "opinions" and great advices from legendary CS scientist and programmer. One of them is about shared_ptr in C++0x. He starts explaining about shared_ptr and how it represents shared ownership of the pointed object. At the last line, he says and I quote:
. A shared_ptr represents shared
ownership but shared ownership isn't
my ideal: It is better if an object
has a definite owner and a definite,
predictable lifespan.
My Question: To what extent does RAII substitute other design patterns like Garbage Collection? I am assuming that manual memory management is not used to represent shared ownership in the system.
To what extent does RAII substitute other design patterns like Garbage Collection? I am assuming that manual memory management is not used to represent shared ownership in the system
Hmm, with GC, you don't really have to think about ownership. The object stays around as long as anyone needs it. Shared ownership is the default and the only choice.
And of course, everything can be done with shared ownership. But it sometimes leads to very clumsy code, because you can't control or limit the lifetime of an object. You have to use C#'s using blocks, or try/finally with close/dispose calls in the finally clause to ensure that the object gets cleaned up when it goes out of scope.
In those cases, RAII is a much better fit: When the object goes out of scope, all the cleanup should happen automatically.
RAII replaces GC to a large extent. 99% of the time, shared ownership isn't really what you want ideally. It is an acceptable compromise, in exchange for saving a lot of headaches by getting a garbage collector, but it doesn't really match what you want. You want the resource to die at some point. Not before, and not after. When RAII is an option, it leads to more elegant, concise and robust code in those cases.
RAII is not perfect though. Mainly because it doesn't deal that well with the occasional case where you just don't know the lifetime of an object. It has to stay around for a long while, as long as anyone uses it. But you don't want to keep it around forever (or as long as the scope surrounding all the clients, which might just be the entirety of the main function).
In those cases, C++ users have to "downgrade" to shared ownership semantics, usually implemented by reference-counting through shared_ptr. And in that case, a GC wins out. It can implement shared ownership much more robustly (able to handle cycles, for example), and more efficiently (the amortized cost of ref counting is huge compared to a decent GC)
Ideally, I'd like to see both in a language. Most of the time, I want RAII, but occasionally, I have a resource I'd just like to throw into the air and not worry about when or where it's going to land, and just trust that it'll get cleaned up when it's safe to do so.
The job of a programmer is to express things elegantly in his language of choice.
C++ has very nice semantics for construction and destruction of objects on the stack. If a resource can be allocated for the duration of a scope block, then a good programmer will probably take that path of least resistance. The object's lifetime is delimited by braces which are probably already there anyway.
If there's no good way to put the object directly on the stack, maybe it can be put inside another object as a member. Now its lifetime is a little longer, but C++ still doe a lot automatically. The object's lifetime is delimited by a parent object — the problem has been delegated.
There might not be one parent, though. The next best thing is a sequence of adoptive parents. This is what auto_ptr is for. Still pretty good, because the programmer should know what particular parent is the owner. The object's lifetime is delimited by the lifetime of its sequence of owners. One step down the chain in determinism and per se elegance is shared_ptr: lifetime delimited by the union of a pool of owners.
But maybe this resource isn't concurrent with any other object, set of objects, or control flow in the system. It's created upon some event happening and destroyed upon another event. Although there are a lot of tools for delimiting lifetimes by delegations and other lifetimes, they aren't sufficient for computing any arbitrary function. So the programmer might decide to write a function of several variables to determine whether an object is coming into existence or disappearing, and call new and delete.
Finally, writing functions can be hard. Maybe the rules governing the object would take too much time and memory to actually compute! And it might just be really hard to express them elegantly, getting back to my original point. So for that we have garbage collection: the object lifetime is delimited by when you want it and when you don't.
Sorry for the rant, but I think the best way to answer your question is context: shared_ptr is just a tool for computing the lifetime of an object, which fits into a broad spectrum of alternatives. It works when it works. It should be used when it's elegant. It should not be used if you have less than a pool of owners, or if you're trying to compute some sophisticated function using it as a convoluted way to increment/decrement.
My Question: To what extent does RAII substitute other design patterns
like Garbage Collection? I am assuming that manual memory management
is not used to represent shared ownership in the system.
I'm not sure about calling this a design pattern, but in my equally strong opinion, and just talking about memory resources, RAII tackles almost all the problems that GC can solve while introducing fewer.
Is shared ownership of objects a sign of bad design?
I share the thought that shared ownership is far, far from ideal in most cases, because the high-level design doesn't necessarily call for it. About the only time I've found it unavoidable is during the implementation of persistent data structures, where it's at least internalized as an implementation detail.
The biggest problem I find with GC or just shared ownership in general is that it doesn't free the developer of any responsibilities when it comes to application resources, but can give the illusion of doing so. If we have a case like this (Scene is the sole logical owner of the resource, but other things hold a reference/pointer to it like a camera storing a scene exclusion list defined by the user to omit from rendering):
And let's say the application resource is like an image, and its lifetime is tied to user input (ex: the image should be freed when the user requests to close a document containing it), then the work to properly free the resource is the same with or without GC.
Without GC, we might remove it from a scene list and allow its destructor to be invoked, while triggering an event to allow Thing1, Thing2, and Thing3 to set their pointers to it to null or remove them from a list so that they don't have dangling pointers.
With GC, it's basically the same thing. We remove the resource from the scene list while triggering an event to allow Thing1, Thing2, and Thing3 to set their references to null or remove them from a list so that the garbage collector can collect it.
The Silent Programmer Mistake Which Flies Under Radar
The difference in this scenario is what happens when a programmer mistake occurs, like Thing2 failing to handle the removal event. If Thing2 stores a pointer, it now has a dangling pointer and we might have a crash. That's catastrophic but something we might easily catch in our unit and integration tests, or at least something QA or testers will catch rather quickly. I don't work in a mission-critical or security-critical context, so if the crashy code managed to ship somehow, it's still not so bad if we can get a bug report, reproduce it, and detect it and fix it rather quickly.
If Thing2 stores a strong reference and shares ownership, we have a very silent logical leak, and the image won't be freed until Thing2 is destroyed (which it might not be destroyed until shutdown). In my domain, this silent nature of the mistake is very problematic, since it can go quietly unnoticed even after shipping until users start to notice that working in the application for an hour causes it to take gigabytes of memory, e.g., and starts slowing down until they restart it. And at that point, we might have accumulated a large number of these issues, since it's so easy for them to fly under the radar like a stealth fighter bug, and there's nothing I dislike more than stealth fighter bugs.
And it's due to that silent nature that I tend to dislike shared ownership with a passion, and TBH I never understood why GC is so popular (might be my particular domain -- I'm admittedly very ignorant of ones that are mission-critical, e.g.) to the point where I'm hungry for new languages without GC. I have found investigating all such leaks related to shared ownership to be very time-consuming, and sometimes investigating for hours only to find the leak was caused by source code outside of our control (third party plugin).
Weak References
Weak references are conceptually ideal to me for Thing1, Thing2, and Thing3. That would allow them to detect when the resource has been destroyed in hindsight without extending its lifetime, and perhaps we could guarantee a crash in those cases or some might even be able to gracefully deal with this in hindsight. The problem to me is that weak references are convertible to strong references and vice versa, so among the internal and third party developers out there, someone could still carelessly end up storing a strong reference in Thing2 even though a weak reference would have been far more appropriate.
I did try in the past to encourage the use of weak references as much as possible among the internal team and documenting that it should be used in the SDK. Unfortunately it was difficult to promote the practice among such a wide and mixed group of people, and we still ended up with our share of logical leaks.
The ease at which anyone, at any given time, can extend the lifetime of an object far longer than appropriate by simply storing a strong reference to it in their object starts to become a very frightening prospect when looking down at a huge codebase that's leaking massive resources. I often wish that a very explicit syntax was required to store any kind of strong reference as a member of an object of a kind which at least would lead a developer to think twice about doing it needlessly.
Explicit Destruction
So I tend to favor explicit destruction for persistent application resources, like so:
on_removal_event:
// This is ideal to me, not trying to release a bunch of strong
// references and hoping things get implicitly destroyed.
destroy(app_resource);
... since we can count on it to free the resource. We can't be completely assured that something out there in the system won't end up having a dangling pointer or weak reference, but at least those issues tend to be easy to detect and reproduce in testing. They don't go unnoticed for ages and accumulate.
The one tricky case has always been multithreading for me. In those cases, what I have found useful instead of full-blown garbage collection or, say, shared_ptr, is to simply defer destruction somehow:
on_removal_event:
// *May* be deferred until threads are finished processing the resource.
destroy(app_resource);
In some systems where the persistent threads are unified in a way such that they have a processing event, e.g., we can mark the resource to be destroyed in a deferred fashion in a time slice when threads are not being processed (almost starts to feel like stop-the-world GC, but we're keeping explicit destruction). In other cases, we might use, say, reference counting but in a way that avoids shared_ptr, where a resource's reference count starts at zero and will be destroyed using that explicit syntax above unless a thread locally extends its lifetime by incrementing the counter temporarily (ex: using a scoped resource in a local thread function).
As roundabout as that seems, it avoids exposing GC references or shared_ptr to the outside world which can easily tempt some developers (internally on your team or a third party developer) to store strong references (shared_ptr, e.g.) as a member of an object like Thing2 and thereby extend a resource's lifetime inadvertently, and possibly for far, far longer than appropriate (possibly all the way until application shutdown).
RAII
Meanwhile RAII automatically eliminates physical leaks just as well as GC, but furthermore, it works for resources other than just memory. We can use it for a scoped mutex, a file which automatically closes on destruction, we can use it even to automatically reverse external side effects through scope guards, etc. etc.
So if given the choice and I had to pick one, it's easily RAII for me. I work in a domain where those silent memory leaks caused by shared ownership are absolutely killer, and a dangling pointer crash is actually preferable if (and it likely will) be caught early during testing. Even in some really obscure event that it's caught late, if it manifests itself in a crash close to the site where the mistake occurred, that's still preferable than using memory profiling tools and trying to figure out who forgot to release a reference while wading through millions of lines of code. In my very blunt opinion, GC introduces more problems than it solves for my particular domain (VFX, which is somewhat similar to games in terms of scene organization and application state), and one of the reasons besides those very silent shared ownership leaks is because it can give the developers the false impression that they don't have to think about resource management and ownership of persistent application resources while inadvertently causing logical leaks left and right.
"When does RAII fail"
The only case I've ever encountered in my whole career where I couldn't think of any possible way to avoid shared ownership of some sort is when I implemented a library of persistent data structures, like so:
I used it to implement an immutable mesh data structure which can have portions modified without being made unique, like so (test with 4 million quadrangles):
Every single frame, a new mesh is being created as the user drags over it and sculpts it. The difference is that the new mesh is strong referencing parts not made unique by the brush so that we don't have to copy all the vertices, all the polygons, all the edges, etc. The immutable version trivializes thread safety, exception safety, non-destructive editing, undo systems, instancing, etc.
In this case the whole concept of the immutable data structure revolves around shared ownership to avoid duplicating data that wasn't made unique. It's a genuine case where we cannot avoid shared ownership no matter what (at least I can't think of any possible way).
That's about the only case where we might need GC or reference counting that I've encountered. Others might have encountered some of their own, but from my experience, very, very few cases genuinely need shared ownership at the design level.
Is Garbage Collection a Design Pattern? I don't know.
The big advantage of shared ownership, is its inherent predictability. With GC the reclamation of resources is out of your hands. Thats the point. When, and how it happens is usually not on the mind of the developer using it. With shared ownership, you are in control (beware, sometimes too much control is a bad thing). Lets say your app spawns off a million shared_ptr's to X. All of those are your doing, you are responsible for them and you have total control over when those references are created and destroyed. So a determined and careful programmer should know exaclty who references what and for how long. If you want an object to be destroyed, you have to destroy all the shared references to it, and viola, it's gone.
This carries some profound consequences for people who make realtime software, which MUST be totally predictable. This also means you can fudge up in ways that look an awful lot like memory leaks. I personally don't want to be a determined and careful programmer when I don't have to be (go ahead and laugh, I want to go on picnics and bike rides, not count my references), so where appropriate GC is my prefered route. I have written a little bit of realtime sound software, and used shared references to manage resources predictably.
Your question: When does RAII fail? (In the context of shared references)
My Answer: When you can't answer the question: who may have a reference to this? When vicious insipid circles of ownership develop.
My question: When does GC fail?
My answer: When you want total control and predictability. When the GC is the written by Sun Microsystems in a last minute dash to deadline and has ridiculous behaviors which could only have been designed and implemented by severely drunk protohuman code monkeys borrowed from Microsoft.
My opinion: I think BS is just really serious about clear design. It seems obvious that having one place where resources are destroyed is usually a clearer design than having many places where they might destroyed.