Why implement DB connection pointer object as a reference counting pointer? (C++) - c++

At our company one of the core C++ classes (Database connection pointer) is implemented as a reference counting pointer. To be clear, the objects are NOT DB connections themselves, but pointers to a DB connection object.
The library is very old, and nobody who designed is around anymore.
So far, nether I, nor any C++ experts in the company that I asked have come up with a good reason for why this particular design was chosen. Any ideas?
It is introducing some problems (partially due to awful reference pointer implementation used), and I'm trying to understand if this design actually has some deep underlying reasons?
The usage pattern these days seems to be that the DB connection pointer object is returned by a DB connection manager class, and it's somewhat unclear whether DB connection pointers were designed to be able to be used independently of DB connection manager.

Probably it's a mistake. Without looking at the code it's impossible to know for sure, but the quality of the reference-counted pointer implementation is suggestive. Poor design, especially around resource management, is not unheard of in the C++ community</bitter sarcasm>.
With that said, reference-counted pointers are useful when you have objects of indeterminate lifetime which are very expensive to create, or whose state needs to be shared among several users. Depending on the underlying architecture, database connections could fit this definition: if each database connection needs to authenticate over the global internet, say, it could easily be worth your trouble to save a single connection and reuse it, rather than making new connections and disposing of them as you go.
But if I've understood you correctly, you don't have a single database connection object with a collection of refcounted pointers pointing to it. Rather, you have a database connection object, a collection of ordinary pointers to it, and a collection of refcounted pointers to those pointers. This is insanity, and almost certainly the result of confused thinking by the original developers. The alternative is that it was an act of deliberate evil, e.g. to ensure job security. If so they must have failed, as none of the perpetrators still work for your company.

Related

Why shouldn't I use shared_ptr and unique_ptr always and instead use normal pointers?

I have a background in C# and obj-c so RC/GC are things I (still) hold dear to me. As I started learning C++ in more depth, I can't stop wondering why I would use normal pointers when they are so unmanaged instead of other alternative solutions?
the shared_ptr provides a great way to store references and not lose track of them without deleting them. I can see practical approaches for normal pointers but they seem just bad practices.
Can someone make of case of these alternatives?
Ofcourse you're encouraged to use shared and unique ptr IF they're owning pointers. If you only need an observer however, a raw pointer will do just fine (a pointer bearing no responsibility for whatever it points to).
There is basically no overhead for std::uniqe_ptr and there is some in std::shared_ptr as it does reference counting for you, but rarely will you ever need to save on execution time here.
Also, there is no need for "smart" pointers if you can guarantee lifetime / ownership hierarchy by design; say a parent node in a tree outliving its children - although this is slightly related to the fact whether the pointer actually owns something.
As someone else mentioned, in C++ you have to consider ownership. That being said, the 3D networked multiplayer FPS I'm currently working on has an official rule called "No new or delete." It uses only shared and unique pointers for designating ownership, and raw pointers retrieved from them (using .get()) everywhere that we need to interact with C API's. The performance hit is not noticeable. I use this as an example to illustrate the negligible performance hit since games/simulations typically have the strictest performance requirements.
This has also significantly reduced the amount of time spent debugging and hunting down memory leaks. In theory, a well-designed application would never run into these problems. In real life working with deadlines, legacy systems, or existing game engines that were poorly designed, however, they are an inevitability on large projects like games... unless you use smart pointers. If you must dynamically allocate, don't have ample time for designing/rewriting the architecture or debugging problems related to resource management, and you want to get it off the ground as quickly as possible, smart pointers are the way to go and incur no noticeable performance cost even in large-scale games.
The question is more the opposite: why should you use smart
pointers? Unlike C# (and I think Obj-C), C++ makes extensive
use of value semantics, which means that unless an object has an
application specific lifetime (in which case, none of the smart
pointers apply), you will normally be using value semantics, and
no dynamic allocation. There are exceptions, but if you make it
a point of thinking in terms of value semantics (e.g. like
int), of defining appropriate copy constructors and assignment
operators where necessary, and not allocating dynamically
unless the object has an externally defined lifetime, you'll
find that you rarely need to do anything more; everything just
takes care of itself. Using smart pointers is very much the
exception in most well written C++.
Sometimes you need to interoperate with C APIs, in which case you'll need to use raw pointers
for at least those parts of the code.
In embedded systems, pointers are often used to access registers of hardware chips or memory at specific addresses.
Since the hardware registers already exist, there is no need to dynamically allocate them. There won't be any memory leaks if the pointer is deleted or its value changed.
Similarly with function pointers. Functions are not dynamically allocated and they have "fixed" addresses. Reassigning a function pointer or deleting one will not cause a memory leak.

Debugging COM reference counters

In a project, I'm talking to an object in a .EXE server (the object performs expensive queries for me which should be cached), and I seem to have gotten my reference counts wrong, which makes the server process free an object it still holds a reference to, making the host process fail in curious and interesting ways that involve losing data and sending a bug report to the vendor.
Is there a way I can ask COM to raise some condition that is detectable in a debugger if a proxy object whose refcount has dropped to zero is used in some way?
It's not likely that this is possible using raw interfaces - the reference count is maintained by the COM server and how it's impelmented is up to the server - the implementation is inside the server code, so unless you have the source and can debug the server, you have no way of getting to it.
However, it is likely your problem is cause by manually calling AddRef and Release. If that is the case, you can use a RAII/smart pointer solution. ATL provides one, but if for whatever reason you can't use that, it's easy enough to create your own. You can then not only create or use provided debugging facilities to keep track of reference counting, you will be much less likely to get it wrong in the first place.

Problems with Singleton Pattern

I've been reading about Singleton pattern for last few days. The general perception is that the scenarios where it is required are quite few (if not rare) probably because it has its own set of problems such as
In a garbage collection environment it can be an issue with regards to memory management.
In a multithreaded environment it can cause bottlenecks and introduce synchronization problems.
Headache from testing prespective.
I'm starting to get the ideas behind these issues but not totally sure about these concerns.
Like in case of garbage collection issue, usage of static in singleton implementation (which is inherent to the pattern), is that the concern? Since it would mean that the static instance will last till the application. Is it something that degrades memory management (it just means that the memory allocated to the singleton pattern won't be freed)?
Ofcourse in a multithreaded setup, having all the threads being in contention for the singleton instance would be a bottleneck. But how does usage of this pattern causes synchronization problems (surely we can use a mutex or something like that to synchronize access).
From a (unit?)testing perspective, since singletons use static methods (which are difficult to be mocked or stubbed) they can cause problems. Not sure about this. Can someone please elaborate on this testing concern?
Thanks.
In a garbage collection environment it can be an issue with regards to memory management
In typical singleton implementations, once you create the singleton you can never destroy it. This non-destructive nature is sometimes acceptable when the singleton is small. However, if the singleton is massive, then you are unnecessarily using more memory than you should.
This is a bigger issue in languages where you have a garbage collector (like Java, Python, etc) because the garbage collector will always believe that the singleton is necessary. In C++, you can cheat by delete-ing the pointer. However, this opens its own can of worms because it's supposed to be a singleton, but by deleting it, you are making it possible to create a second one.
In most cases, this over-use of memory does not degrade memory performance, but it can be considered the same as a memory leak. With a large singleton, you are wasting memory on your user's computer or device. (You can run into memory fragmentation if you allocate a huge singleton, but this is usually a non-concern).
In a multithreaded environment it can cause bottlenecks and introduce synchronization problems.
If every thread is accessing the same object and you are using a mutex, each thread must wait until another has unlocked the singleton. And if the threads depend greatly upon the singleton, then you will degrade performance to a single-thread environment because a thread spends most of its life waiting.
However, if your application domain allows it, you can create one object for each thread -- this way the thread does not spend time waiting and instead does the work.
Headache from testing prespective.
Notably, a singleton's constructor can only be tested once. You have to create an entirely new test suite in order to test the constructor again. This is fine if your constructor doesn't take any parameters, but once you accept a paremeter you can no longer effective unit teest.
Further, you can't stub out the singleton as effectively and your use of mock objects becomes difficult to use (there are ways around this, but it's more trouble than it's worth). Keep reading for more on this...
(And it leads to a bad design, too!)
Singletons are also a sign of a poor design. Some programmers want to make their database class a singleton. "Our application will never use two databases," they typically think. But, there will come a time when it may make sense to use two databases, or unit testing you will want to use two different SQLite databases. If you used a singleton, you will have to make some serious changes to your application. But if you used regular objects from the start, you can take advantage of OOP to get your task done efficiently and on-time.
Most cases of singleton's are the result of the programmer being lazy. They do not wish to pass around an object (eg, database object) to a bunch of methods, so they create a singleton that each method uses as an implicit parameter. But, this approach bites for the reasons above.
Try to never use a singleton, if you can. Although they may seem like a good approach from the start, it usually always leads to poor design and hard to maintain code down the line.
If you haven't seen the article Singletons are Pathological Liars, you should read that too. It discusses how the interconnections between singletons are hidden from the interface, so the way you need to construct software is also hidden from the interface.
There are links to a couple of other articles on singletons by the same author.
When evaluating the Singleton pattern, you have to ask "What's the alternative? Would the same problems happen if I didn't use the Singleton pattern?"
Most systems have some need for Big Global Objects. These are items which are large and expensive (eg Database Connection Managers), or hold pervasive state information (for example, locking information).
The alternative to a Singleton is to have this Big Global Object created on startup, and passed as a parameter to all of the classes or methods that need access to this object.
Would the same problems happen in the non-singleton case? Let's examine them one by one:
Memory Management: The Big Global Object would exist when the application was started, and the object will exist until shutdown. As there is only one object, it will take up exactly the same amount of memory as the singleton case. Memory usage is not an issue. (#MadKeithV: Order of destruction at shutdown is a different issue).
Multithreading and bottlenecks: All of the threads would need to access the same object, whether they were passed this object as a parameter or whether they called MyBigGlobalObject.GetInstance(). So Singleton or not, you would still have the same synchronisation issues, (which fortunately have standard solutions). This isn't an issue either.
Unit testing: If you aren't using the Singleton pattern, then you can create the Big Global Object at the start of each test, and the garbage collector will take it away when the test completes. Each test will start with a new, clean environment that's unnaffected by the previous test. Alternatively, in the Singleton case, the one object lives through ALL of the tests, and can easily become "contaminated". So yes, the Singleton pattern really bites when it comes to unit testing.
My preference: because of the unit testing issue alone, I tend to avoid the Singleton pattern. If it's one of the few environments where I don't have unit testing (for example, the user interface layer) then I might use Singletons, otherwise I avoid them.
My main argument against singletons is basically that they combine two bad properties.
The things you mention can be a problem, sure, but they don't have to be. The synchronization thing can be fixed, it only becomes a bottleneck if many threads frequently access the singleton, and so on. Those issues are annoying, but not deal-breakers.
The much more fundamental problem with singletons is that what they're trying to do is fundamentally bad.
A singleton, as defined by the GoF, has two properties:
It is globally accessible, and
It prevents the class from ever being instantiated more than once.
The first one should be simple. Globals are, generally speaking, bad. If you don't want a global, then you don't want a singleton either.
The second issue is less obvious, but fundamentally, it attempts to solve a nonexistent problem.
When was the last time you accidentally instantiated a class, where you instead intended to reuse an existing instance?
When was the last time you accidentally typed "std::ostream() << "hello world << std::endl", when you meant "std::cout << "hello world << std::endl"?
It just doesn't happen. So we don't need to prevent this in the first place.
But more importantly, the gut feeling that "only one instance must exist" is almost always wrong.
What we usually mean is "I can currently only see a use for one instance".
but "I can only see a use for one instance" is not the same as "the application will come crashing down if anyone dares to create two instances".
In the latter case, a singleton might be justified. but in the former, it's really a premature design choice.
Usually, we do end up wanting more than one instance.
You often end up needing more than one logger. There's the log you write clean, structured messages to, for the client to monitor, and there's the one you dump debug data to for your own use.
It's also easy to imagine that you might end up using more than one database.
Or program settings. Sure, only one set of settings can be active at a time. But while they're active, the user might enter the "options" dialog and configure a second set of settings. He hasn't applied them yet, but once he hits 'ok', they have to be swapped in and replace the currently active set. And that means that until he's hit 'ok', two sets of options actually exist.
And more generally, unit testing:
One of the fundamental rules of unit tests is that they should be run in isolation. Each test should set up the environment from scratch, run the test, and tear everything down. Which means that each test will be wanting to create a new singleton object, run the test against it, and close it.
Which obviously isn't possible, because a singleton is created once, and only once. It can't be deleted. New instances can't be created.
So ultimately, the problem with singletons isn't technicalities like "it's hard to get thread safety correct", but a much more fundamental "they don't actually contribute anything positive to your code. They add two traits, each of them negative, to your codebase. Who would ever want that?"
About this unit testing concern. The main problems seems to be not with testing the singletons themselves, but with testing the objects that use them.
Such objects cannot be isolated for testing, since they have dependencies on singletons which are both hidden and hard to remove. It gets even worse if the singleton represents an interface to an external system (DB connection, payment processor, ICBM firing unit). Testing such an object might unexpectedly write into DB, send some money who knows where or even fire some intercontinental missiles.
I agree with the earlier sentiment that frequently they're used so that you don't have to pass an argument all over the place. I do it. The typical example is your system logging object. I typically would make that a singleton so I don't have to pass it all over the system.
Survey - In the example of the logging object, how many of you (show of hands) would add an extra arg to any routine that might need to log something -vs- use a singleton?
I wouldn't necessarily equate Singletons with Globals. Nothing should stop a developer from passing an instance of the object, singleton or otherwise, as a parameter, rather than conjure it out of the air. The intent of hiding its global accessibility could even be done by hiding its getInstance function to a few select friends.
As far as the unit testing flaw, Unit means small, so re-invoking the app to test the singleton a different way seems reasonable, unless I'm missing something the point.

Is shared ownership of objects a sign of bad design?

Background: When reading Dr. Stroustrup's papers and FAQs, I notice some strong "opinions" and great advices from legendary CS scientist and programmer. One of them is about shared_ptr in C++0x. He starts explaining about shared_ptr and how it represents shared ownership of the pointed object. At the last line, he says and I quote:
. A shared_ptr represents shared
ownership but shared ownership isn't
my ideal: It is better if an object
has a definite owner and a definite,
predictable lifespan.
My Question: To what extent does RAII substitute other design patterns like Garbage Collection? I am assuming that manual memory management is not used to represent shared ownership in the system.
To what extent does RAII substitute other design patterns like Garbage Collection? I am assuming that manual memory management is not used to represent shared ownership in the system
Hmm, with GC, you don't really have to think about ownership. The object stays around as long as anyone needs it. Shared ownership is the default and the only choice.
And of course, everything can be done with shared ownership. But it sometimes leads to very clumsy code, because you can't control or limit the lifetime of an object. You have to use C#'s using blocks, or try/finally with close/dispose calls in the finally clause to ensure that the object gets cleaned up when it goes out of scope.
In those cases, RAII is a much better fit: When the object goes out of scope, all the cleanup should happen automatically.
RAII replaces GC to a large extent. 99% of the time, shared ownership isn't really what you want ideally. It is an acceptable compromise, in exchange for saving a lot of headaches by getting a garbage collector, but it doesn't really match what you want. You want the resource to die at some point. Not before, and not after. When RAII is an option, it leads to more elegant, concise and robust code in those cases.
RAII is not perfect though. Mainly because it doesn't deal that well with the occasional case where you just don't know the lifetime of an object. It has to stay around for a long while, as long as anyone uses it. But you don't want to keep it around forever (or as long as the scope surrounding all the clients, which might just be the entirety of the main function).
In those cases, C++ users have to "downgrade" to shared ownership semantics, usually implemented by reference-counting through shared_ptr. And in that case, a GC wins out. It can implement shared ownership much more robustly (able to handle cycles, for example), and more efficiently (the amortized cost of ref counting is huge compared to a decent GC)
Ideally, I'd like to see both in a language. Most of the time, I want RAII, but occasionally, I have a resource I'd just like to throw into the air and not worry about when or where it's going to land, and just trust that it'll get cleaned up when it's safe to do so.
The job of a programmer is to express things elegantly in his language of choice.
C++ has very nice semantics for construction and destruction of objects on the stack. If a resource can be allocated for the duration of a scope block, then a good programmer will probably take that path of least resistance. The object's lifetime is delimited by braces which are probably already there anyway.
If there's no good way to put the object directly on the stack, maybe it can be put inside another object as a member. Now its lifetime is a little longer, but C++ still doe a lot automatically. The object's lifetime is delimited by a parent object — the problem has been delegated.
There might not be one parent, though. The next best thing is a sequence of adoptive parents. This is what auto_ptr is for. Still pretty good, because the programmer should know what particular parent is the owner. The object's lifetime is delimited by the lifetime of its sequence of owners. One step down the chain in determinism and per se elegance is shared_ptr: lifetime delimited by the union of a pool of owners.
But maybe this resource isn't concurrent with any other object, set of objects, or control flow in the system. It's created upon some event happening and destroyed upon another event. Although there are a lot of tools for delimiting lifetimes by delegations and other lifetimes, they aren't sufficient for computing any arbitrary function. So the programmer might decide to write a function of several variables to determine whether an object is coming into existence or disappearing, and call new and delete.
Finally, writing functions can be hard. Maybe the rules governing the object would take too much time and memory to actually compute! And it might just be really hard to express them elegantly, getting back to my original point. So for that we have garbage collection: the object lifetime is delimited by when you want it and when you don't.
Sorry for the rant, but I think the best way to answer your question is context: shared_ptr is just a tool for computing the lifetime of an object, which fits into a broad spectrum of alternatives. It works when it works. It should be used when it's elegant. It should not be used if you have less than a pool of owners, or if you're trying to compute some sophisticated function using it as a convoluted way to increment/decrement.
My Question: To what extent does RAII substitute other design patterns
like Garbage Collection? I am assuming that manual memory management
is not used to represent shared ownership in the system.
I'm not sure about calling this a design pattern, but in my equally strong opinion, and just talking about memory resources, RAII tackles almost all the problems that GC can solve while introducing fewer.
Is shared ownership of objects a sign of bad design?
I share the thought that shared ownership is far, far from ideal in most cases, because the high-level design doesn't necessarily call for it. About the only time I've found it unavoidable is during the implementation of persistent data structures, where it's at least internalized as an implementation detail.
The biggest problem I find with GC or just shared ownership in general is that it doesn't free the developer of any responsibilities when it comes to application resources, but can give the illusion of doing so. If we have a case like this (Scene is the sole logical owner of the resource, but other things hold a reference/pointer to it like a camera storing a scene exclusion list defined by the user to omit from rendering):
And let's say the application resource is like an image, and its lifetime is tied to user input (ex: the image should be freed when the user requests to close a document containing it), then the work to properly free the resource is the same with or without GC.
Without GC, we might remove it from a scene list and allow its destructor to be invoked, while triggering an event to allow Thing1, Thing2, and Thing3 to set their pointers to it to null or remove them from a list so that they don't have dangling pointers.
With GC, it's basically the same thing. We remove the resource from the scene list while triggering an event to allow Thing1, Thing2, and Thing3 to set their references to null or remove them from a list so that the garbage collector can collect it.
The Silent Programmer Mistake Which Flies Under Radar
The difference in this scenario is what happens when a programmer mistake occurs, like Thing2 failing to handle the removal event. If Thing2 stores a pointer, it now has a dangling pointer and we might have a crash. That's catastrophic but something we might easily catch in our unit and integration tests, or at least something QA or testers will catch rather quickly. I don't work in a mission-critical or security-critical context, so if the crashy code managed to ship somehow, it's still not so bad if we can get a bug report, reproduce it, and detect it and fix it rather quickly.
If Thing2 stores a strong reference and shares ownership, we have a very silent logical leak, and the image won't be freed until Thing2 is destroyed (which it might not be destroyed until shutdown). In my domain, this silent nature of the mistake is very problematic, since it can go quietly unnoticed even after shipping until users start to notice that working in the application for an hour causes it to take gigabytes of memory, e.g., and starts slowing down until they restart it. And at that point, we might have accumulated a large number of these issues, since it's so easy for them to fly under the radar like a stealth fighter bug, and there's nothing I dislike more than stealth fighter bugs.
And it's due to that silent nature that I tend to dislike shared ownership with a passion, and TBH I never understood why GC is so popular (might be my particular domain -- I'm admittedly very ignorant of ones that are mission-critical, e.g.) to the point where I'm hungry for new languages without GC. I have found investigating all such leaks related to shared ownership to be very time-consuming, and sometimes investigating for hours only to find the leak was caused by source code outside of our control (third party plugin).
Weak References
Weak references are conceptually ideal to me for Thing1, Thing2, and Thing3. That would allow them to detect when the resource has been destroyed in hindsight without extending its lifetime, and perhaps we could guarantee a crash in those cases or some might even be able to gracefully deal with this in hindsight. The problem to me is that weak references are convertible to strong references and vice versa, so among the internal and third party developers out there, someone could still carelessly end up storing a strong reference in Thing2 even though a weak reference would have been far more appropriate.
I did try in the past to encourage the use of weak references as much as possible among the internal team and documenting that it should be used in the SDK. Unfortunately it was difficult to promote the practice among such a wide and mixed group of people, and we still ended up with our share of logical leaks.
The ease at which anyone, at any given time, can extend the lifetime of an object far longer than appropriate by simply storing a strong reference to it in their object starts to become a very frightening prospect when looking down at a huge codebase that's leaking massive resources. I often wish that a very explicit syntax was required to store any kind of strong reference as a member of an object of a kind which at least would lead a developer to think twice about doing it needlessly.
Explicit Destruction
So I tend to favor explicit destruction for persistent application resources, like so:
on_removal_event:
// This is ideal to me, not trying to release a bunch of strong
// references and hoping things get implicitly destroyed.
destroy(app_resource);
... since we can count on it to free the resource. We can't be completely assured that something out there in the system won't end up having a dangling pointer or weak reference, but at least those issues tend to be easy to detect and reproduce in testing. They don't go unnoticed for ages and accumulate.
The one tricky case has always been multithreading for me. In those cases, what I have found useful instead of full-blown garbage collection or, say, shared_ptr, is to simply defer destruction somehow:
on_removal_event:
// *May* be deferred until threads are finished processing the resource.
destroy(app_resource);
In some systems where the persistent threads are unified in a way such that they have a processing event, e.g., we can mark the resource to be destroyed in a deferred fashion in a time slice when threads are not being processed (almost starts to feel like stop-the-world GC, but we're keeping explicit destruction). In other cases, we might use, say, reference counting but in a way that avoids shared_ptr, where a resource's reference count starts at zero and will be destroyed using that explicit syntax above unless a thread locally extends its lifetime by incrementing the counter temporarily (ex: using a scoped resource in a local thread function).
As roundabout as that seems, it avoids exposing GC references or shared_ptr to the outside world which can easily tempt some developers (internally on your team or a third party developer) to store strong references (shared_ptr, e.g.) as a member of an object like Thing2 and thereby extend a resource's lifetime inadvertently, and possibly for far, far longer than appropriate (possibly all the way until application shutdown).
RAII
Meanwhile RAII automatically eliminates physical leaks just as well as GC, but furthermore, it works for resources other than just memory. We can use it for a scoped mutex, a file which automatically closes on destruction, we can use it even to automatically reverse external side effects through scope guards, etc. etc.
So if given the choice and I had to pick one, it's easily RAII for me. I work in a domain where those silent memory leaks caused by shared ownership are absolutely killer, and a dangling pointer crash is actually preferable if (and it likely will) be caught early during testing. Even in some really obscure event that it's caught late, if it manifests itself in a crash close to the site where the mistake occurred, that's still preferable than using memory profiling tools and trying to figure out who forgot to release a reference while wading through millions of lines of code. In my very blunt opinion, GC introduces more problems than it solves for my particular domain (VFX, which is somewhat similar to games in terms of scene organization and application state), and one of the reasons besides those very silent shared ownership leaks is because it can give the developers the false impression that they don't have to think about resource management and ownership of persistent application resources while inadvertently causing logical leaks left and right.
"When does RAII fail"
The only case I've ever encountered in my whole career where I couldn't think of any possible way to avoid shared ownership of some sort is when I implemented a library of persistent data structures, like so:
I used it to implement an immutable mesh data structure which can have portions modified without being made unique, like so (test with 4 million quadrangles):
Every single frame, a new mesh is being created as the user drags over it and sculpts it. The difference is that the new mesh is strong referencing parts not made unique by the brush so that we don't have to copy all the vertices, all the polygons, all the edges, etc. The immutable version trivializes thread safety, exception safety, non-destructive editing, undo systems, instancing, etc.
In this case the whole concept of the immutable data structure revolves around shared ownership to avoid duplicating data that wasn't made unique. It's a genuine case where we cannot avoid shared ownership no matter what (at least I can't think of any possible way).
That's about the only case where we might need GC or reference counting that I've encountered. Others might have encountered some of their own, but from my experience, very, very few cases genuinely need shared ownership at the design level.
Is Garbage Collection a Design Pattern? I don't know.
The big advantage of shared ownership, is its inherent predictability. With GC the reclamation of resources is out of your hands. Thats the point. When, and how it happens is usually not on the mind of the developer using it. With shared ownership, you are in control (beware, sometimes too much control is a bad thing). Lets say your app spawns off a million shared_ptr's to X. All of those are your doing, you are responsible for them and you have total control over when those references are created and destroyed. So a determined and careful programmer should know exaclty who references what and for how long. If you want an object to be destroyed, you have to destroy all the shared references to it, and viola, it's gone.
This carries some profound consequences for people who make realtime software, which MUST be totally predictable. This also means you can fudge up in ways that look an awful lot like memory leaks. I personally don't want to be a determined and careful programmer when I don't have to be (go ahead and laugh, I want to go on picnics and bike rides, not count my references), so where appropriate GC is my prefered route. I have written a little bit of realtime sound software, and used shared references to manage resources predictably.
Your question: When does RAII fail? (In the context of shared references)
My Answer: When you can't answer the question: who may have a reference to this? When vicious insipid circles of ownership develop.
My question: When does GC fail?
My answer: When you want total control and predictability. When the GC is the written by Sun Microsystems in a last minute dash to deadline and has ridiculous behaviors which could only have been designed and implemented by severely drunk protohuman code monkeys borrowed from Microsoft.
My opinion: I think BS is just really serious about clear design. It seems obvious that having one place where resources are destroyed is usually a clearer design than having many places where they might destroyed.

Garbage Collection in C++ -- why?

I keep hearing people complaining that C++ doesn't have garbage collection. I also hear that the C++ Standards Committee is looking at adding it to the language. I'm afraid I just don't see the point to it... using RAII with smart pointers eliminates the need for it, right?
My only experience with garbage collection was on a couple of cheap eighties home computers, where it meant that the system would freeze up for a few seconds every so often. I'm sure it has improved since then, but as you can guess, that didn't leave me with a high opinion of it.
What advantages could garbage collection offer an experienced C++ developer?
I keep hearing people complaining that C++ doesn't have garbage collection.
I am so sorry for them. Seriously.
C++ has RAII, and I always complain to find no RAII (or a castrated RAII) in Garbage Collected languages.
What advantages could garbage collection offer an experienced C++ developer?
Another tool.
Matt J wrote it quite right in his post (Garbage Collection in C++ -- why?): We don't need C++ features as most of them could be coded in C, and we don't need C features as most of them could coded in Assembly, etc.. C++ must evolve.
As a developer: I don't care about GC. I tried both RAII and GC, and I find RAII vastly superior. As said by Greg Rogers in his post (Garbage Collection in C++ -- why?), memory leaks are not so terrible (at least in C++, where they are rare if C++ is really used) as to justify GC instead of RAII. GC has non deterministic deallocation/finalization and is just a way to write a code that just don't care with specific memory choices.
This last sentence is important: It is important to write code that "juste don't care". In the same way in C++ RAII we don't care about ressource freeing because RAII do it for us, or for object initialization because constructor do it for us, it is sometimes important to just code without caring about who is owner of what memory, and what kind pointer (shared, weak, etc.) we need for this or this piece of code. There seems to be a need for GC in C++. (even if I personaly fail to see it)
An example of good GC use in C++
Sometimes, in an app, you have "floating data". Imagine a tree-like structure of data, but no one is really "owner" of the data (and no one really cares about when exactly it will be destroyed). Multiple objects can use it, and then, discard it. You want it to be freed when no one is using it anymore.
The C++ approach is using a smart pointer. The boost::shared_ptr comes to mind. So each piece of data is owned by its own shared pointer. Cool. The problem is that when each piece of data can refer to another piece of data. You cannot use shared pointers because they are using a reference counter, which won't support circular references (A points to B, and B points to A). So you must know think a lot about where to use weak pointers (boost::weak_ptr), and when to use shared pointers.
With a GC, you just use the tree structured data.
The downside being that you must not care when the "floating data" will really be destroyed. Only that it will be destroyed.
Conclusion
So in the end, if done properly, and compatible with the current idioms of C++, GC would be a Yet Another Good Tool for C++.
C++ is a multiparadigm language: Adding a GC will perhaps make some C++ fanboys cry because of treason, but in the end, it could be a good idea, and I guess the C++ Standards Comitee won't let this kind of major feature break the language, so we can trust them to make the necessary work to enable a correct C++ GC that won't interfere with C++: As always in C++, if you don't need a feature, don't use it and it will cost you nothing.
The short answer is that garbage collection is very similar in principle to RAII with smart pointers. If every piece of memory you ever allocate lies within an object, and that object is only referred to by smart pointers, you have something close to garbage collection (potentially better). The advantage comes from not having to be so judicious about scoping and smart-pointering every object, and letting the runtime do the work for you.
This question seems analogous to "what does C++ have to offer the experienced assembly developer? instructions and subroutines eliminate the need for it, right?"
With the advent of good memory checkers like valgrind, I don't see much use to garbage collection as a safety net "in case" we forgot to deallocate something - especially since it doesn't help much in managing the more generic case of resources other than memory (although these are much less common). Besides, explicitly allocating and deallocating memory (even with smart pointers) is fairly rare in the code I've seen, since containers are a much simpler and better way usually.
But garbage collection can offer performance benefits potentially, especially if alot of short lived objects are being heap allocated. GC also potentially offers better locality of reference for newly created objects (comparable to objects on the stack).
The motivating factor for GC support in C++ appears to be lambda programming, anonymous functions etc. It turns out that lambda libraries benefit from the ability to allocate memory without caring about cleanup. The benefit for ordinary developers would be simpler, more reliable and faster compiling lambda libraries.
GC also helps simulate infinite memory; the only reason you need to delete PODs is that you need to recycle memory. If you have either GC or infinite memory, there is no need to delete PODs anymore.
I don't understand how one can argue that RAII replaces GC, or is vastly superior. There are many cases handled by a gc that RAII simply cannot deal with at all. They are different beasts.
First, RAII is not bullet proof: it works against some common failures which are pervasive in C++, but there are many cases where RAII does not help at all; it is fragile to asynchronous events (like signals under UNIX). Fundamentally, RAII relies on scoping: when a variable is out of scope, it is automatically freed (assuming the destructor is correctly implemented of course).
Here is a simple example where neither auto_ptr or RAII can help you:
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <memory>
using namespace std;
volatile sig_atomic_t got_sigint = 0;
class A {
public:
A() { printf("ctor\n"); };
~A() { printf("dtor\n"); };
};
void catch_sigint (int sig)
{
got_sigint = 1;
}
/* Emulate expensive computation */
void do_something()
{
sleep(3);
}
void handle_sigint()
{
printf("Caught SIGINT\n");
exit(EXIT_FAILURE);
}
int main (void)
{
A a;
auto_ptr<A> aa(new A);
signal(SIGINT, catch_sigint);
while (1) {
if (got_sigint == 0) {
do_something();
} else {
handle_sigint();
return -1;
}
}
}
The destructor of A will never be called. Of course, it is an artificial and somewhat contrived example, but a similar situation can actually happen; for example when your code is called by another code which handles SIGINT and which you have no control over at all (concrete example: mex extensions in matlab). It is the same reason why finally in python does not guarantee execution of something. Gc can help you in this case.
Other idioms do not play well with this: in any non trivial program, you will need stateful objects (I am using the word object in a very broad sense here, it can be any construction allowed by the language); if you need to control the state outside one function, you can't easily do that with RAII (which is why RAII is not that helpful for asynchronous programming). OTOH, gc have a view of the whole memory of your process, that is it knows about all the objects it allocated, and can clean asynchronously.
It can also be much faster to use gc, for the same reasons: if you need to allocate/deallocate many objects (in particular small objects), gc will vastly outperform RAII, unless you write a custom allocator, since the gc can allocate/clean many objects in one pass. Some well known C++ projects use gc, even where performance matter (see for example Tim Sweenie about the use of gc in Unreal Tournament: http://lambda-the-ultimate.org/node/1277). GC basically increases throughput at the cost of latency.
Of course, there are cases where RAII is better than gc; in particular, the gc concept is mostly concerned with memory, and that's not the only ressource. Things like file, etc... can be well handled with RAII. Languages without memory handling like python or ruby do have something like RAII for those cases, BTW (with statement in python). RAII is very useful when you precisely need to control when the ressource is freed, and that's quite often the case for files or locks for example.
The committee isn't adding garbage-collection, they are adding a couple of features that allow garbage collection to be more safely implemented. Only time will tell whether they actually have any effect whatsoever on future compilers. The specific implementations could vary widely, but will most likely involve reachability-based collection, which could involve a slight hang, depending on how it's done.
One thing is, though, no standards-conformant garbage collector will be able to call destructors - only to silently reuse lost memory.
What advantages could garbage collection offer an experienced C++ developer?
Not having to chase down resource leaks in your less-experienced colleagues' code.
It's an all-to-common error to assume that because C++ does not have garbage collection baked into the language, you can't use garbage collection in C++ period. This is nonsense. I know of elite C++ programmers who use the Boehm collector as a matter of course in their work.
Garbage collection allows to postpone the decision about who owns an object.
C++ uses value semantics, so with RAII, indeed, objects are recollected when going out of scope. This is sometimes referred to as "immediate GC".
When your program starts using reference-semantics (through smart pointers etc...), the language does no longer support you, you're left to the wit of your smart pointer library.
The tricky thing about GC is deciding upon when an object is no longer needed.
Garbage collection makes RCU lockless synchronization much easier to implement correctly and efficiently.
Easier thread safety and scalability
There is one property of GC which may be very important in some scenarios. Assignment of pointer is naturally atomic on most platforms, while creating thread-safe reference counted ("smart") pointers is quite hard and introduces significant synchronization overhead. As a result, smart pointers are often told "not to scale well" on multi-core architecture.
Garbage collection is really the basis for automatic resource management. And having GC changes the way you tackle problems in a way that is hard to quantify. For example when you are doing manual resource management you need to:
Consider when an item can be freed (are all modules/classes finished with it?)
Consider who's responsibility it is to free a resource when it is ready to be freed (which class/module should free this item?)
In the trivial case there is no complexity. E.g. you open a file at the start of a method and close it at the end. Or the caller must free this returned block of memory.
Things start to get complicated quickly when you have multiple modules that interact with a resource and it is not as clear who needs to clean up. The end result is that the whole approach to tackling a problem includes certain programming and design patterns which are a compromise.
In languages that have garbage collection you can use a disposable pattern where you can free resources you know you've finished with but if you fail to free them the GC is there to save the day.
Smart pointers which is actually a perfect example of the compromises I mentioned. Smart pointers can't save you from leaking cyclic data structures unless you have a backup mechanism. To avoid this problem you often compromise and avoid using a cyclic structure even though it may otherwise be the best fit.
I, too, have doubts that C++ commitee is adding a full-fledged garbage collection to the standard.
But I would say that the main reason for adding/having garbage collection in modern language is that there are too few good reasons against garbage collection. Since eighties there were several huge advances in the field of memory management and garbage collection and I believe there are even garbage collection strategies that could give you soft-real-time-like guarantees (like, "GC won't take more than .... in the worst case").
using RAII with smart pointers eliminates the need for it, right?
Smart pointers can be used to implement reference counting in C++ which is a form of garbage collection (automatic memory management) but production GCs no longer use reference counting because it has some important deficiencies:
Reference counting leaks cycles. Consider A↔B, both objects A and B refer to each other so they both have a reference count of 1 and neither is collected but they should both be reclaimed. Advanced algorithms like trial deletion solve this problem but add a lot of complexity. Using weak_ptr as a workaround is falling back to manual memory management.
Naive reference counting is slow for several reasons. Firstly, it requires out-of-cache reference counts to be bumped often (see Boost's shared_ptr up to 10× slower than OCaml's garbage collection). Secondly, destructors injected at the end of scope can incur unnecessary-and-expensive virtual function calls and inhibit optimizations such as tail call elimination.
Scope-based reference counting keeps floating garbage around as objects are not recycled until the end of scope whereas tracing GCs can reclaim them as soon as they become unreachable, e.g. can a local allocated before a loop be reclaimed during the loop?
What advantages could garbage collection offer an experienced C++ developer?
Productivity and reliability are the main benefits. For many applications, manual memory management requires significant programmer effort. By simulating an infinite-memory machine, garbage collection liberates the programmer from this burden which allows them to focus on problem solving and evades some important classes of bugs (dangling pointers, missing free, double free). Furthermore, garbage collection facilitates other forms of programming, e.g. by solving the upwards funarg problem (1970).
In a framework that supports GC, a reference to an immutable object such as a string may be passed around in the same way as a primitive. Consider the class (C# or Java):
public class MaximumItemFinder
{
String maxItemName = "";
int maxItemValue = -2147483647 - 1;
public void AddAnother(int itemValue, String itemName)
{
if (itemValue >= maxItemValue)
{
maxItemValue = itemValue;
maxItemName = itemName;
}
}
public String getMaxItemName() { return maxItemName; }
public int getMaxItemValue() { return maxItemValue; }
}
Note that this code never has to do anything with the contents of any of the strings, and can simply treat them as primitives. A statement like maxItemName = itemName; will likely generate two instructions: a register load followed by a register store. The MaximumItemFinder will have no way of knowing whether callers of AddAnother are going to retain any reference to the passed-in strings, and callers will have no way of knowing how long MaximumItemFinder will retain references to them. Callers of getMaxItemName will have no way of knowing if and when MaximumItemFinder and the original supplier of the returned string have abandoned all references to it. Because code can simply pass string references around like primitive values, however, none of those things matter.
Note also that while the class above would not be thread-safe in the presence of simultaneous calls to AddAnother, any call to GetMaxItemName would be guaranteed to return a valid reference to either an empty string or one of the strings that had been passed to AddAnother. Thread synchronization would be required if one wanted to ensure any relationship between the maximum-item name and its value, but memory safety is assured even in its absence.
I don't think there's any way to write a method like the above in C++ which would uphold memory safety in the presence of arbitrary multi-threaded usage without either using thread synchronization or else requiring that every string variable have its own copy of its contents, held in its own storage space, which may not be released or relocated during the lifetime of the variable in question. It would certainly not be possible to define a string-reference type which could be defined, assigned, and passed around as cheaply as an int.
Garbage Collection Can Make Leaks Your Worst Nightmare
Full-fledged GC that handles things like cyclic references would be somewhat of an upgrade over a ref-counted shared_ptr. I would somewhat welcome it in C++, but not at the language level.
One of the beauties about C++ is that it doesn't force garbage collection on you.
I want to correct a common misconception: a garbage collection myth that it somehow eliminates leaks. From my experience, the worst nightmares of debugging code written by others and trying to spot the most expensive logical leaks involved garbage collection with languages like embedded Python through a resource-intensive host application.
When talking about subjects like GC, there's theory and then there's practice. In theory it's wonderful and prevents leaks. Yet at the theoretical level, so is every language wonderful and leak-free since in theory, everyone would write perfectly correct code and test every single possible case where a single piece of code could go wrong.
Garbage collection combined with less-than-ideal team collaboration caused the worst, hardest-to-debug leaks in our case.
The problem still has to do with ownership of resources. You have to make clear design decisions here when persistent objects are involved, and garbage collection makes it all too easy to think that you don't.
Given some resource, R, in a team environment where the developers aren't constantly communicating and reviewing each other's code carefully at alll times (something a little too common in my experience), it becomes quite easy for developer A to store a handle to that resource. Developer B does as well, perhaps in an obscure way that indirectly adds R to some data structure. So does C. In a garbage-collected system, this has created 3 owners of R.
Because developer A was the one that created the resource originally and thinks he's the owner of it, he remembers to release the reference to R when the user indicates that he no longer wants to use it. After all, if he fails to do so, nothing would happen and it would be obvious from testing that the user-end removal logic did nothing. So he remembers to release it, as any reasonably competent developer would do. This triggers an event for which B handles it and also remembers to release the reference to R.
However, C forgets. He's not one of the stronger developers on the team: a somewhat fresh recruit who has only worked in the system for a year. Or maybe he's not even on the team, just a popular third party developer writing plugins for our product that many users add to the software. With garbage collection, this is when we get those silent logical resource leaks. They're the worst kind: they do not necessarily manifest in the user-visible side of the software as an obvious bug besides the fact that over durations of running the program, the memory usage just continues to rise and rise for some mysterious purpose. Trying to narrow down these issues with a debugger can be about as fun as debugging a time-sensitive race condition.
Without garbage collection, developer C would have created a dangling pointer. He may try to access it at some point and cause the software to crash. Now that's a testing/user-visible bug. C gets embarrassed a bit and corrects his bug. In the GC scenario, just trying to figure out where the system is leaking may be so difficult that some of the leaks are never corrected. These are not valgrind-type physical leaks that can be detected easily and pinpointed to a specific line of code.
With garbage collection, developer C has created a very mysterious leak. His code may continue to access R which is now just some invisible entity in the software, irrelevant to the user at this point, but still in a valid state. And as C's code creates more leaks, he's creating more hidden processing on irrelevant resources, and the software is not only leaking memory but also getting slower and slower each time.
So garbage collection does not necessarily mitigate logical resource leaks. It can, in less than ideal scenarios, make leaks far easier to silently go unnoticed and remain in the software. The developers might get so frustrated trying to trace down their GC logical leaks that they simply tell their users to restart the software periodically as a workaround. It does eliminate dangling pointers, and in a safety-obsessed software where crashing is completely unacceptable under any scenario, then I would prefer GC. But I'm often working in less safety-critical but resource-intensive, performance-critical products where a crash that can be fixed promptly is preferable to a really obscure and mysterious silent bug, and resource leaks are not trivial bugs there.
In both of these cases, we're talking about persistent objects not residing on the stack, like a scene graph in a 3D software or the video clips available in a compositor or the enemies in a game world. When resources tie their lifetimes to the stack, both C++ and any other GC language tend to make it trivial to manage resources properly. The real difficulty lies in persistent resources referencing other resources.
In C or C++, you can have dangling pointers and crashes resulting from segfaults if you fail to clearly designate who owns a resource and when handles to them should be released (ex: set to null in response to an event). Yet in GC, that loud and obnoxious but often easy-to-spot crash is exchanged for a silent resource leak that may never be detected.