Using AutoZone garbage collector - c++

Has anyone tried to use the autozone garbage collector from Apple? Or can you point to a good and configurable one usable with C++?
edit: I work on decision diagrams (like BDD), so I would like to test if managing the memory with a garbage collector is efficient in this case.
edit 2: To be more precise, when implementing a library for decision diagrams, you HAVE TO implement a garbage collector. In fact, I already did this for my library, but it represents more or less 25% of the code. And it is the most complicated part :-) So yes, I want a garbage collector :-) And yes, I already use RAII techniques. And, finally, I can not afford the cost of a shared_ptr, because I store billions objects which need to be garbage collected.

Have you already analyzed if you really need an implicit garbage collection library? are you sure it is not just java (or Objective C, ...) nostalgia?
That is not natural in C++, so you will probably get into more problems than you solve. Actual implementations are mostly used in experimentation tests, and not for production apps. The best way to squeeze the potential of a language is to do things the way are tackled in that language.
Check first if explicit garbage collection (boost::shared_ptr and friends) cover your needs, and avoid introducing complexity when possible.
After Alexandre edit 2: Magic does not exist I'm afraid. Why do you think a garbage collector will be more efficient than RAII idioms.
If you don't need reference counting you can use scoped_ptr. But if you need it, you will have to pay for it, apart from how much you hide it.
Maybe your problem is allocating dinamically so many objects. If they are small ones, you will find really interesting the chapter 4 (Small-Object Allocation) of "Modern C++ Design" (Andrei Alexandrescu).

Most people tend to avoid garbage collectors in C++.
They are generally not necessary, once you learn to use RAII to manage your resources, and because C++ does not have proper support for garbage collection, the GC's that exist have a couple of problems:
They don't catch every allocation (they have to make a conservative guess of whether some allocation is referenced or not)
They may not play nice with destructors
Of course there are situations where a GC in C++ is useful. But 95% of the cases, you'll be better served simply by learning the appropriate memory management techniques (RAII) yourself.
But I haven't used Autozone, and don't know how well it works in your case or in general.

Actually, Garbage Collection was a part of the upcoming C++ 20XX standard, but was dropped for reasons of difficulty of implementation, complexity, etc...
So, sure, lots of people avoid GC in C++, but there is a strong enough demand that the standards committee is actively considering it.
Apple's AutoZone is a language agnostic garbage collector that could be bent for use with C++. Certainly, that AutoZone works for Objective-C (and C) would make for a good foundation implementation.
AutoZone is also used by the MacRuby project and, I believe, a handful of other projects. It is designed to be portable, though the implementation has bits specific to the x86 and ppc architectures -- you would need to port it to other CPU types, if necessary.
The collector has an API that can be used directly to register/unregister objects and express connectivity, etc...
It wouldn't be easy, but it is certainly doable.

No I haven't tried that. You may try this from hp labs, with more details going here. This collector works on Linux, *BSD, recent Windows versions, MacOS X, HP/UX, Solaris, Tru64, Irix and a few other operating systems.

Related

C and C++ lack of garbage collection

Why some languages like C and C++ don't have garbage collection? I am used to Java, so I'm not sure what the benefits are of not having it?
Performance. There are generally two types of memory management for higher level languages. There is the garbage collection approach, and there is the reference counting found in Apple systems. Both of them incur a cost of performance when tracing objects and de-allocating them, and that has a footprint on the memory and CPU time needed.
Since C & C++ are old languages relatively, they were designed to fit with embedded systems and wide range of devices with many constraints, thus they could not afford to have a memory management.
With the right amount of practice and exposure, it should not be that difficult to manage C/C++ apps relatively well.
BTW, C# has a garbage collection. Only C and C++ don't have one.
Edit:
As the others might have added, in newer standards of C++, there is the shared pointer shared_ptr that applies reference counting approach to memory management.
In addition to what other people have pointed out about performance (which is perfectly correct), I'd also like to point out that garbage collection is a major problem for real-time systems due to the unpredictability that that it introduces.
Recall that a real-time system is a system in which tasks must meet certain time constraints in order to be correct. For example, if you're writing code for a robot and it doesn't figure out that it's about to hit a wall in time to stop, then clearly that's not a correct result. The fact that you figured out that you're about to hit a wall after you've already done so is completely useless.
From the Microsoft documentation,
To reclaim objects, the garbage collector must stop all the executing threads in an application. In some situations, such as when an application retrieves data or displays content, a full garbage collection can occur at a critical time and impede performance.
This is particularly problematic since it can be very difficult or impossible to predict when the garbage collector is going to run or exactly how long it'll take to complete. This could potentially cause tasks to miss their deadline.
The article I linked to above describes in a lot more detail the performance implications of the garbage collector and ways you can minimize its impact in time-critical situations.
For what it's worth, there actually are efforts to make real-time versions of C# and Java. See, for example, Real Time Java.
I do realize that the vast majority of systems aren't real-time, but for the ones that are (engine controls, etc.) there's an obvious benefit for having a language without garbage collection.
Because it is more efficient that way. C* languages are geared towards:
producing as fast a binary program as possible
still be easy to understand for humans (so assembler, while faster, does not fulfill this requirement)
If you did automatic garbage collection, this takes time, and some times, it is not necessary for the program to run correctly. The programmer can decide on a case by case basis, if it is necessary or not.
C# does have garbage collection.
Garbage collection is really nice for programmers, but it requires a runtime cost. C/C++ are systems programming languages and, as such, they need to be able to run on bare metal - with as minimal a runtime as possible. This means that things like Garbage Collection are not possible.
Also, Garbage Collection can make it very difficult to reason about the memory consumption of your program. If you're designing, say, a real-time system responsible for keeping an airplane in the air, you don't want to risk a GC pause causing a catastrophic failure.
There are a number of reasons, at least with respect to C:
C is a product of the early 1970s - note that older languages like Fortran, Cobol, Pascal, etc., don't have automatic garbage collection either;
C was derived from a systems programming language, and as such tends to not offer many high-level abstractions (pointers and streams are about as high level as it gets);
C and C++ are (typically) compiled to native binaries running directly on top of the OS (or even bare metal) - note that most languages that do automatic garbage collection run under a VM or an interpreter;
The philosophy of C is that the programmer is in the best position to know when a resource should be allocated or released, and is skilled enough to write the code to manage it;
As others have mentioned, AGC plays hell with performance-critical code;
EDIT
Note that automatic garbage collection has been around in some form since 1959, when McCarthy added it to Lisp. However, it didn't really become a thing in "mainstream" programming languages until the 1990s when Java came along; at that point, most systems were fast enough that we were willing to trade the (minor) performance hit for (somewhat) more robust code.
AGC can make life a lot easier in some respects, but it also introduces a small level of unpredictability in your code. RAII as usually practiced in C++ is a good compromise between all-manual memory management and automatic garbage collection. Done correctly, it gives you all the benefits of worry-free resource management and predictability.

Are there practical uses of C++11's Garbage Collection ABI?

C++11 introduced an interface to garbage collectors. From what I see, it provides a standardized way to communicate with the GC (e.g. declare_no_pointers), and to get information about how disguised pointers are handled (e.g., get_pointer_safety).
However, there is no standardized way in C++11 yet to allocate a raw block of memory, which you don't have to free manually. There are use cases where that would help, even if destructors are not called. One example is to implement efficient concurrent data structures (as mentioned by Herb Sutter) without having to deal with complicated cleanup protocols.
So far, so good. My question (from the perspective of an ordinary develper, not a GC library developer):
Is there a real-world example where the new C++11 GC interface has helped you?
At least from my perspective the world has not changed. If you need GC, you still have to find a non-standard library, for example Boehm GC, and learn how to integrate and use it. The new standardized interface won't help very much in that respect. It will also not solve portability issues.
(In the long term, the common interface defined by the C++11 standard hopefully pays off. However, my question targets only the immediate future.)
No, there is no currently practical usage of C++11 GC interface as there is no compiler which fully supports this API in the meantime. Also, C++11 standard declares this API as optional and there is no movement seen to implement it in the major compilers (but as Jesse Good notes MSVC already does support it).
Also you should look this post, it has related information: Why garbage collection when RAII is available?
std::shared_ptr provides what is called reference-counted garbage collection. It is simple to implement but has some drawbacks. Specifically it is less efficient than alternative forms of garbage collection in many applications, and more importantly it cannot handle cyclic references.
Java and C# are called "managed languages" as opposed to C++ which is called an "unmanaged language" mostly because they implement mark-and-sweep garbage collection. Mark-and-sweep garbage collection handles cyclic references. It does this by logically searching the graph of reachable objects, then periodically deleting those that are unreachable. There are more sophisticated algorithms that are optimizations of this (one is called "generational"), but the underlying structure is just mark-and-sweep.
The problem with implementing mark-and-sweep in C++ is that there are a lot of operations that make it difficult to track the object graph. The "safely-derived pointer" concept seeks to separate out and define these issues so that we can say which features you can use to maintain the integrity of the GCs view of the object graph. It should then be possible for a compiler to statically identify and diagnose constructs that violate these (reinterpret casts, pointer arithmetic, and so on).
Those that claim "why would you want garbage collection when you have RAII" are confused. RAII is one possible memory model which uses an ownership concept. Each object must be owned by exactly one other object, and that owner is in charge of its lifetime. For many object models this simply is not natural or conveniant, as one object is referenced by several others, and there is no clear owner. For many applications you want an objects lifetime to end automatically once it becomes unreferenced, and this is how Java and C# work "by default".
It is my impression that the new memory model and "safely-derived object" concept should lead to a real optional mark-and-sweep garbage collector to be made available in the standard library. Such a feature would be extremely welcome - but I don't think it is there yet. The "safely-derived object" stuff is a foundation for things to come.

Why doesn't C++ have a garbage collector?

I'm not asking this question because of the merits of garbage collection first of all. My main reason for asking this is that I do know that Bjarne Stroustrup has said that C++ will have a garbage collector at some point in time.
With that said, why hasn't it been added? There are already some garbage collectors for C++. Is this just one of those "easier said than done" type things? Or are there other reasons it hasn't been added (and won't be added in C++11)?
Cross links:
Garbage collectors for C++
Just to clarify, I understand the reasons why C++ didn't have a garbage collector when it was first created. I'm wondering why the collector can't be added in.
Implicit garbage collection could have been added in, but it just didn't make the cut. Probably due to not just implementation complications, but also due to people not being able to come to a general consensus fast enough.
A quote from Bjarne Stroustrup himself:
I had hoped that a garbage collector
which could be optionally enabled
would be part of C++0x, but there were
enough technical problems that I have
to make do with just a detailed
specification of how such a collector
integrates with the rest of the
language, if provided. As is the case
with essentially all C++0x features,
an experimental implementation exists.
There is a good discussion of the topic here.
General overview:
C++ is very powerful and allows you to do almost anything. For this reason it doesn't automatically push many things onto you that might impact performance. Garbage collection can be easily implemented with smart pointers (objects that wrap pointers with a reference count, which auto delete themselves when the reference count reaches 0).
C++ was built with competitors in mind that did not have garbage collection. Efficiency was the main concern that C++ had to fend off criticism from in comparison to C and others.
There are 2 types of garbage collection...
Explicit garbage collection:
C++0x has garbage collection via pointers created with shared_ptr
If you want it you can use it, if you don't want it you aren't forced into using it.
For versions before C++0x, boost:shared_ptr exists and serves the same purpose.
Implicit garbage collection:
It does not have transparent garbage collection though. It will be a focus point for future C++ specs though.
Why Tr1 doesn't have implicit garbage collection?
There are a lot of things that tr1 of C++0x should have had, Bjarne Stroustrup in previous interviews stated that tr1 didn't have as much as he would have liked.
To add to the debate here.
There are known issues with garbage collection, and understanding them helps understanding why there is none in C++.
1. Performance ?
The first complaint is often about performance, but most people don't really realize what they are talking about. As illustrated by Martin Beckett the problem may not be performance per se, but the predictability of performance.
There are currently 2 families of GC that are widely deployed:
Mark-And-Sweep kind
Reference-Counting kind
The Mark And Sweep is faster (less impact on overall performance) but it suffers from a "freeze the world" syndrome: i.e. when the GC kicks in, everything else is stopped until the GC has made its cleanup. If you wish to build a server that answers in a few milliseconds... some transactions will not live up to your expectations :)
The problem of Reference Counting is different: reference-counting adds overhead, especially in Multi-Threading environments because you need to have an atomic count. Furthermore there is the problem of reference cycles so you need a clever algorithm to detect those cycles and eliminate them (generally implement by a "freeze the world" too, though less frequent). In general, as of today, this kind (even though normally more responsive or rather, freezing less often) is slower than the Mark And Sweep.
I have seen a paper by Eiffel implementers that were trying to implement a Reference Counting Garbage Collector that would have a similar global performance to Mark And Sweep without the "Freeze The World" aspect. It required a separate thread for the GC (typical). The algorithm was a bit frightening (at the end) but the paper made a good job of introducing the concepts one at a time and showing the evolution of the algorithm from the "simple" version to the full-fledged one. Recommended reading if only I could put my hands back on the PDF file...
2. Resources Acquisition Is Initialization (RAII)
It's a common idiom in C++ that you will wrap the ownership of resources within an object to ensure that they are properly released. It's mostly used for memory since we don't have garbage collection, but it's also useful nonetheless for many other situations:
locks (multi-thread, file handle, ...)
connections (to a database, another server, ...)
The idea is to properly control the lifetime of the object:
it should be alive as long as you need it
it should be killed when you're done with it
The problem of GC is that if it helps with the former and ultimately guarantees that later... this "ultimate" may not be sufficient. If you release a lock, you'd really like that it be released now, so that it does not block any further calls!
Languages with GC have two work arounds:
don't use GC when stack allocation is sufficient: it's normally for performance issues, but in our case it really helps since the scope defines the lifetime
using construct... but it's explicit (weak) RAII while in C++ RAII is implicit so that the user CANNOT unwittingly make the error (by omitting the using keyword)
3. Smart Pointers
Smart pointers often appear as a silver bullet to handle memory in C++. Often times I have heard: we don't need GC after all, since we have smart pointers.
One could not be more wrong.
Smart pointers do help: auto_ptr and unique_ptr use RAII concepts, extremely useful indeed. They are so simple that you can write them by yourself quite easily.
When one need to share ownership however it gets more difficult: you might share among multiple threads and there are a few subtle issues with the handling of the count. Therefore, one naturally goes toward shared_ptr.
It's great, that's what Boost for after all, but it's not a silver bullet. In fact, the main issue with shared_ptr is that it emulates a GC implemented by Reference Counting but you need to implement the cycle detection all by yourself... Urg
Of course there is this weak_ptr thingy, but I have unfortunately already seen memory leaks despite the use of shared_ptr because of those cycles... and when you are in a Multi Threaded environment, it's extremely difficult to detect!
4. What's the solution ?
There is no silver bullet, but as always, it's definitely feasible. In the absence of GC one need to be clear on ownership:
prefer having a single owner at one given time, if possible
if not, make sure that your class diagram does not have any cycle pertaining to ownership and break them with subtle application of weak_ptr
So indeed, it would be great to have a GC... however it's no trivial issue. And in the mean time, we just need to roll up our sleeves.
What type? should it be optimised for embedded washing machine controllers, cell phones, workstations or supercomputers?
Should it prioritise gui responsiveness or server loading?
should it use lots of memory or lots of CPU?
C/c++ is used in just too many different circumstances.
I suspect something like boost smart pointers will be enough for most users
Edit - Automatic garbage collectors aren't so much a problem of performance (you can always buy more server) it's a question of predicatable performance.
Not knowing when the GC is going to kick in is like employing a narcoleptic airline pilot, most of the time they are great - but when you really need responsiveness!
One of the biggest reasons that C++ doesn't have built in garbage collection is that getting garbage collection to play nice with destructors is really, really hard. As far as I know, nobody really knows how to solve it completely yet. There are alot of issues to deal with:
deterministic lifetimes of objects (reference counting gives you this, but GC doesn't. Although it may not be that big of a deal).
what happens if a destructor throws when the object is being garbage collected? Most languages ignore this exception, since theres really no catch block to be able to transport it to, but this is probably not an acceptable solution for C++.
How to enable/disable it? Naturally it'd probably be a compile time decision but code that is written for GC vs code that is written for NOT GC is going to be very different and probably incompatible. How do you reconcile this?
These are just a few of the problems faced.
Though this is an old question, there's still one problem that I don't see anybody having addressed at all: garbage collection is almost impossible to specify.
In particular, the C++ standard is quite careful to specify the language in terms of externally observable behavior, rather than how the implementation achieves that behavior. In the case of garbage collection, however, there is virtually no externally observable behavior.
The general idea of garbage collection is that it should make a reasonable attempt at assuring that a memory allocation will succeed. Unfortunately, it's essentially impossible to guarantee that any memory allocation will succeed, even if you do have a garbage collector in operation. This is true to some extent in any case, but particularly so in the case of C++, because it's (probably) not possible to use a copying collector (or anything similar) that moves objects in memory during a collection cycle.
If you can't move objects, you can't create a single, contiguous memory space from which to do your allocations -- and that means your heap (or free store, or whatever you prefer to call it) can, and probably will, become fragmented over time. This, in turn, can prevent an allocation from succeeding, even when there's more memory free than the amount being requested.
While it might be possible to come up with some guarantee that says (in essence) that if you repeat exactly the same pattern of allocation repeatedly, and it succeeded the first time, it will continue to succeed on subsequent iterations, provided that the allocated memory became inaccessible between iterations. That's such a weak guarantee it's essentially useless, but I can't see any reasonable hope of strengthening it.
Even so, it's stronger than what has been proposed for C++. The previous proposal [warning: PDF] (that got dropped) didn't guarantee anything at all. In 28 pages of proposal, what you got in the way of externally observable behavior was a single (non-normative) note saying:
[ Note: For garbage collected programs, a high quality hosted implementation should attempt to maximize the amount of unreachable memory it reclaims. —end note ]
At least for me, this raises a serious question about return on investment. We're going to break existing code (nobody's sure exactly how much, but definitely quite a bit), place new requirements on implementations and new restrictions on code, and what we get in return is quite possibly nothing at all?
Even at best, what we get are programs that, based on testing with Java, will probably require around six times as much memory to run at the same speed they do now. Worse, garbage collection was part of Java from the beginning -- C++ places enough more restrictions on the garbage collector that it will almost certainly have an even worse cost/benefit ratio (even if we go beyond what the proposal guaranteed and assume there would be some benefit).
I'd summarize the situation mathematically: this a complex situation. As any mathematician knows, a complex number has two parts: real and imaginary. It appears to me that what we have here are costs that are real, but benefits that are (at least mostly) imaginary.
If you want automatic garbage collection, there are good commercial
and public-domain garbage collectors for C++. For applications where
garbage collection is suitable, C++ is an excellent garbage collected
language with a performance that compares favorably with other garbage
collected languages. See The C++ Programming Language (4rd
Edition) for a discussion of automatic garbage collection in C++.
See also, Hans-J. Boehm's site for C and C++ garbage collection (archive).
Also, C++ supports programming techniques that allow memory
management to be safe and implicit without a garbage collector. I consider garbage collection a last choice and an imperfect way of handling for resource management. That does not mean that it is never useful, just that there are better approaches in many situations.
Source: http://www.stroustrup.com/bs_faq.html#garbage-collection
As for why it doesnt have it built in, If I remember correctly it was invented before GC was the thing, and I don't believe the language could have had GC for several reasons(I.E Backwards compatibilty with C)
Hope this helps.
tl;dr: Because modern C++ doesn't need garbage collection.
Bjarne Stroustrup's FAQ answer on this matter says:
I don't like garbage. I don't like littering. My ideal is to eliminate the need for a garbage collector by not producing any garbage. That is now possible.
The situation, for code written these days (C++17 and following the official Core Guidelines) is as follows:
Most memory ownership-related code is in libraries (especially those providing containers).
Most use of code involving memory ownership follows the CADRe or RAII pattern, so allocation is made on construction and deallocation on destruction, which happens when exiting the scope in which something was allocated.
You do not explicitly allocate or deallocate memory directly.
Raw pointers do not own memory (if you've followed the guidelines), so you can't leak by passing them around.
If you're wondering how you're going to pass the starting addresses of sequences of values in memory - you can and should prefer span's, obviating the need for raw pointers. You can still use such pointers, they'll just be non-owning.
If you really need an owning "pointer", you use C++' standard-library smart pointers - they can't leak, and are decently efficient (although the ABI can get in the way of that). Alternatively, you can pass ownership across scope boundaries with "owner pointers". These are uncommon and must be used explicitly; but when adopted - they allow for nice static checking against leaks.
"Oh yeah? But what about...
... if I just write code the way we used to write C++ in the old days?"
Indeed, you could just disregard all of the guidelines and write leaky application code - and it will compile and run (and leak), same as always.
But it's not a "just don't do that" situation, where the developer is expected to be virtuous and exercise a lot of self control; it's just not simpler to write non-conforming code, nor is it faster to write, nor is it better-performing. Gradually it will also become more difficult to write, as you would face an increasing "impedance mismatch" with what conforming code provides and expects.
... if I reintrepret_cast? Or do complex pointer arithmetic? Or other such hacks?"
Indeed, if you put your mind to it, you can write code that messes things up despite playing nice with the guidelines. But:
You would do this rarely (in terms of places in the code, not necessarily in terms of fraction of execution time)
You would only do this intentionally, not accidentally.
Doing so will stand out in a codebase conforming to the guidelines.
It's the kind of code in which you would bypass the GC in another language anyway.
... library development?"
If you're a C++ library developer then you do write unsafe code involving raw pointers, and you are required to code carefully and responsibly - but these are self-contained pieces of code written by experts (and more importantly, reviewed by experts).
So, it's just like Bjarne said: There's really no motivation to collect garbage generally, as you all but make sure not to produce garbage. GC is becoming a non-problem with C++.
That is not to say GC isn't an interesting problem for certain specific applications, when you want to employ custom allocation and de-allocations strategies. For those you would want custom allocation and de-allocation, not a language-level GC.
Stroustrup made some good comments on this at the 2013 Going Native conference.
Just skip to about 25m50s in this video. (I'd recommend watching the whole video actually, but this skips to the stuff about garbage collection.)
When you have a really great language that makes it easy (and safe, and predictable, and easy-to-read, and easy-to-teach) to deal with objects and values in a direct way, avoiding (explicit) use of the heap, then you don't even want garbage collection.
With modern C++, and the stuff we have in C++11, garbage collection is no longer desirable except in limited circumstances. In fact, even if a good garbage collector is built into one of the major C++ compilers, I think that it won't be used very often. It will be easier, not harder, to avoid the GC.
He shows this example:
void f(int n, int x) {
Gadget *p = new Gadget{n};
if(x<100) throw SomeException{};
if(x<200) return;
delete p;
}
This is unsafe in C++. But it's also unsafe in Java! In C++, if the function returns early, the delete will never be called. But if you had full garbage collection, such as in Java, you merely get a suggestion that the object will be destructed "at some point in the future" (Update: it's even worse that this. Java does not promise to call the finalizer ever - it maybe never be called). This isn't good enough if Gadget holds an open file handle, or a connection to a database, or data which you have buffered for write to a database at a later point. We want the Gadget to be destroyed as soon as it's finished, in order to free these resources as soon as possible. You don't want your database server struggling with thousands of database connections that are no longer needed - it doesn't know that your program is finished working.
So what's the solution? There are a few approaches. The obvious approach, which you'll use for the vast majority of your objects is:
void f(int n, int x) {
Gadget p = {n}; // Just leave it on the stack (where it belongs!)
if(x<100) throw SomeException{};
if(x<200) return;
}
This takes fewer characters to type. It doesn't have new getting in the way. It doesn't require you to type Gadget twice. The object is destroyed at the end of the function. If this is what you want, this is very intuitive. Gadgets behave the same as int or double. Predictable, easy-to-read, easy-to-teach. Everything is a 'value'. Sometimes a big value, but values are easier to teach because you don't have this 'action at a distance' thing that you get with pointers (or references).
Most of the objects you make are for use only in the function that created them, and perhaps passed as inputs to child functions. The programmer shouldn't have to think about 'memory management' when returning objects, or otherwise sharing objects across widely separated parts of the software.
Scope and lifetime are important. Most of the time, it's easier if the lifetime is the same as the scope. It's easier to understand and easier to teach. When you want a different lifetime, it should be obvious reading the code that you're doing this, by use of shared_ptr for example. (Or returning (large) objects by value, leveraging move-semantics or unique_ptr.
This might seem like an efficiency problem. What if I want to return a Gadget from foo()? C++11's move semantics make it easier to return big objects. Just write Gadget foo() { ... } and it will just work, and work quickly. You don't need to mess with && yourself, just return things by value and the language will often be able to do the necessary optimizations. (Even before C++03, compilers did a remarkably good job at avoiding unnecessary copying.)
As Stroustrup said elsewhere in the video (paraphrasing): "Only a computer scientist would insist on copying an object, and then destroying the original. (audience laughs). Why not just move the object directly to the new location? This is what humans (not computer scientists) expect."
When you can guarantee only one copy of an object is needed, it's much easier to understand the lifetime of the object. You can pick what lifetime policy you want, and garbage collection is there if you want. But when you understand the benefits of the other approaches, you'll find that garbage collection is at the bottom of your list of preferences.
If that doesn't work for you, you can use unique_ptr, or failing that, shared_ptr. Well written C++11 is shorter, easier-to-read, and easier-to-teach than many other languages when it comes to memory management.
The idea behind C++ was that you would not pay any performance impact for features that you don't use. So adding garbage collection would have meant having some programs run straight on the hardware the way C does and some within some sort of runtime virtual machine.
Nothing prevents you from using some form of smart pointers that are bound to some third-party garbage collection mechanism. I seem to recall Microsoft doing something like that with COM and it didn't go to well.
To answer most "why" questions about C++, read Design and Evolution of C++
One of the fundamental principles behind the original C language is that memory is composed of a sequence of bytes, and code need only care about what those bytes mean at the exact moment that they are being used. Modern C allows compilers to impose additional restrictions, but C includes--and C++ retains--the ability to decompose a pointer into a sequence of bytes, assemble any sequence of bytes containing the same values into a pointer, and then use that pointer to access the earlier object.
While that ability can be useful--or even indispensable--in some kinds of applications, a language that includes that ability will be very limited in its ability to support any kind of useful and reliable garbage collection. If a compiler doesn't know everything that has been done with the bits that made up a pointer, it will have no way of knowing whether information sufficient to reconstruct the pointer might exist somewhere in the universe. Since it would be possible for that information to be stored in ways that the computer wouldn't be able to access even if it knew about them (e.g. the bytes making up the pointer might have been shown on the screen long enough for someone to write them down on a piece of paper), it may be literally impossible for a computer to know whether a pointer could possibly be used in the future.
An interesting quirk of many garbage-collected frameworks is that an object reference not defined by the bit patterns contained therein, but by the relationship between the bits held in the object reference and other information held elsewhere. In C and C++, if the bit pattern stored in a pointer identifies an object, that bit pattern will identify that object until the object is explicitly destroyed. In a typical GC system, an object may be represented by a bit pattern 0x1234ABCD at one moment in time, but the next GC cycle might replace all references to 0x1234ABCD with references to 0x4321BABE, whereupon the object would be represented by the latter pattern. Even if one were to display the bit pattern associated with an object reference and then later read it back from the keyboard, there would be no expectation that the same bit pattern would be usable to identify the same object (or any object).
SHORT ANSWER:
We don't know how to do garbage collection efficiently (with minor time and space overhead) and correctly all the time (in all possible cases).
LONG ANSWER:
Just like C, C++ is a systems language; this means it is used when you are writing system code, e.g., operating system. In other words, C++ is designed, just like C, with best possible performance as the main target. The language' standard will not add any feature that might hinder the performance objective.
This pauses the question: Why garbage collection hinders performance? The main reason is that, when it comes to implementation, we [computer scientists] do not know how to do garbage collection with minimal overhead, for all cases. Hence it's impossible to the C++ compiler and runtime system to perform garbage collection efficiently all the time. On the other hand, a C++ programmer, should know his design/implementation and he's the best person to decide how to best do the garbage collection.
Last, if control (hardware, details, etc.) and performance (time, space, power, etc.) are not the main constraints, then C++ is not the right tool. Other language might serve better and offer more [hidden] runtime management, with the necessary overhead.
All the technical talking is overcomplicating the concept.
If you put GC into C++ for all the memory automatically then consider something like a web browser. The web browser must load a full web document AND run web scripts. You can store web script variables in the document tree. In a BIG document in a browser with lots of tabs open, it means that every time the GC must do a full collection it must also scan all the document elements.
On most computers this means that PAGE FAULTS will occur. So the main reason, to answer the question is that PAGE FAULTS will occur. You will know this as when your PC starts making lots of disk access. This is because the GC must touch lots of memory in order to prove invalid pointers. When you have a bona fide application using lots of memory, having to scan all objects every collection is havoc because of the PAGE FAULTS. A page fault is when virtual memory needs to get read back into RAM from disk.
So the correct solution is to divide an application into the parts that need GC and the parts that do not. In the case of the web browser example above, if the document tree was allocated with malloc, but the javascript ran with GC, then every time the GC kicks in it only scans a small portion of memory and all PAGED OUT elements of the memory for the document tree does not need to get paged back in.
To further understand this problem, look up on virtual memory and how it is implemented in computers. It is all about the fact that 2GB is available to the program when there is not really that much RAM. On modern computers with 2GB RAM for a 32BIt system it is not such a problem provided only one program is running.
As an additional example, consider a full collection that must trace all objects. First you must scan all objects reachable via roots. Second scan all the objects visible in step 1. Then scan waiting destructors. Then go to all the pages again and switch off all invisible objects. This means that many pages might get swapped out and back in multiple times.
So my answer to bring it short is that the number of PAGE FAULTS which occur as a result of touching all the memory causes full GC for all objects in a program to be unfeasible and so the programmer must view GC as an aid for things like scripts and database work, but do normal things with manual memory management.
And the other very important reason of course is global variables. In order for the collector to know that a global variable pointer is in the GC it would require specific keywords, and thus existing C++ code would not work.
When we compare C++ with Java, we see that C++ was not designed with implicit Garbage Collection in mind, while Java was.
Having things like arbitrary pointers in C-Style is not only bad for GC-implementations, but it would also destroy backward compatibility for a large amount of C++-legacy-code.
In addition to that, C++ is a language that is intended to run as standalone executable instead of having a complex run-time environment.
All in all:
Yes it might be possible to add Garbage Collection to C++, but for the sake of continuity it is better not to do so.
Mainly for two reasons:
Because it doesn't need one (IMHO)
Because it's pretty much incompatible with RAII, which is the cornerstone of C++
C++ already offers manual memory management, stack allocation, RAII, containers, automatic pointers, smart pointers... That should be enough. Garbage collectors are for lazy programmers who don't want to spend 5 minutes thinking about who should own which objects or when should resources be freed. That's not how we do things in C++.
Imposing garbage collection is really a low level to high level paradigm shift.
If you look at the way strings are handled in a language with garbage collection, you will find they ONLY allow high level string manipulation functions and do not allow binary access to the strings. Simply put, all string functions first check the pointers to see where the string is, even if you are only drawing out a byte. So if you are doing a loop that processes each byte in a string in a language with garbage collection, it must compute the base location plus offset for each iteration, because it cannot know when the string has moved. Then you have to think about heaps, stacks, threads, etc etc.

What can C++ do that is too hard or messy in any other language?

I still feel C++ offers some things that can't be beaten. It's not my intention to start a flame war here, please, if you have strong opinions about not liking C++ don't vent them here. I'm interested in hearing from C++ gurus about why they stick with it.
I'm particularly interested in aspects of C++ that are little known, or underutilised.
RAII / deterministic finalization. No, garbage collection is not just as good when you're dealing with a scarce, shared resource.
Unfettered access to OS APIs.
I have stayed with C++ as it is still the highest performing general purpose language for applications that need to combine efficiency and complexity. As an example, I write real time surface modelling software for hand-held devices for the surveying industry. Given the limited resources, Java, C#, etc... just don't provide the necessary performance characteristics, whereas lower level languages like C are much slower to develop in given the weaker abstraction characteristics. The range of levels of abstraction available to a C++ developer is huge, at one extreme I can be overloading arithmetic operators such that I can say something like MaterialVolume = DesignSurface - GroundSurface while at the same time running a number of different heaps to manage the memory most efficiently for my app on a specific device. Combine this with a wealth of freely available source for solving pretty much any common problem, and you have one heck of a powerful development language.
Is C++ still the optimal development solution for most problems in most domains? Probably not, though at a pinch it can still be used for most of them. Is it still the best solution for efficient development of high performance applications? IMHO without a doubt.
Shooting oneself in the foot.
No other language offers such a creative array of tools. Pointers, multiple inheritance, templates, operator overloading and a preprocessor.
A wonderfully powerful language that also provides abundant opportunities for foot shooting.
Edit: I apologize if my lame attempt at humor has offended some. I consider C++ to be the most powerful language that I have ever used -- with abilities to code at the assembly language level when desired, and at a high level of abstraction when desired. C++ has been my primary language since the early '90s.
My answer was based on years of experience of shooting myself in the foot. At least C++ allows me to do so elegantly.
Deterministic object destruction leads to some magnificent design patterns. For instance, while RAII is not as general a technique as garbage collection, it leads to some impressive capabilities which you cannot get with GC.
C++ is also unique in that it has a Turing-complete preprocessor. This allows you to prefer (as in the opposite of defer) a lot of code tasks to compile time instead of run time. For instance, in real code you might have an assert() statement to test for a never-happen. The reality is that it will sooner or later happen... and happen at 3:00am when you're on vacation. The C++ preprocessor assert does the same test at compile time. Compile-time asserts fail between 8:00am and 5:00pm while you're sitting in front of the computer watching the code build; run-time asserts fail at 3:00am when you're asleep in Hawai'i. It's pretty easy to see the win there.
In most languages, strategy patterns are done at run-time and throw exceptions in the event of a type mismatch. In C++, strategies can be done at compile-time through the preprocessor facility and can be guaranteed typesafe.
Write inline assembly (MMX, SSE, etc.).
Deterministic object destruction. I.e. real destructors. Makes managing scarce resources easier. Allows for RAII.
Easier access to structured binary data. It's easier to cast a memory region as a struct than to parse it and copy each value into a struct.
Multiple inheritance. Not everything can be done with interfaces. Sometimes you want to inherit actual functionality too.
I think i'm just going to praise C++ for its ability to use templates to catch expressions and execute it lazily when it's needed. For those not knowing what this is about, here is an example.
Template mixins provide reuse that I haven't seen elsewhere. With them you can build up a large object with lots of behaviour as though you had written the whole thing by hand. But all these small aspects of its functionality can be reused, it's particularly great for implementing parts of an interface (or the whole thing), where you are implementing a number of interfaces. The resulting object is lightning-fast because it's all inlined.
Speed may not matter in many cases, but when you're writing component software, and users may combine components in unthought-of complicated ways to do things, the speed of inlining and C++ seems to allow much more complex structures to be created.
Absolute control over the memory layout, alignment, and access when you need it. If you're careful enough you can write some very cache-friendly programs. For multi-processor programs, you can also eliminate a lot of slow downs from cache coherence mechanisms.
(Okay, you can do this in C, assembly, and probably Fortran too. But C++ lets you write the rest of your program at a higher level.)
This will probably not be a popular answer, but I think what sets C++ apart are its compile-time capabilities, e.g. templates and #define. You can do all sorts of text manipulation on your program using these features, much of which has been abandoned in later languages in the name of simplicity. To me that's way more important than any low-level bit fiddling that's supposedly easier or faster in C++.
C#, for instance, doesn't have a real macro facility. You can't #include another file directly into the source, or use #define to manipulate the program as text. Think about any time you had to mechanically type repetitive code and you knew there was a better way. You may even have written a program to generate code for you. Well, the C++ preprocessor automates all of these things.
The "generics" facility in C# is similarly limited compared to C++ templates. C++ lets you apply the dot operator to a template type T blindly, calling (for example) methods that may not exist, and checks-for-correctness are only applied once the template is actually applied to a specific class. When that happens, if all the assumptions you made about T actually hold, then your code will compile. C# doesn't allow this... type "T" basically has to be dealt with as an Object, i.e. using only the lowest common denominator of operations available to everything (assignment, GetHashCode(), Equals()).
C# has done away with the preprocessor, and real generics, in the name of simplicity. Unfortunately, when I use C#, I find myself reaching for substitutes for these C++ constructs, which are inevitably more bloated and layered than the C++ approach. For example, I have seen programmers work around the absence of #include in several bloated ways: dynamically linking to external assemblies, re-defining constants in several locations (one file per project) or selecting constants from a database, etc.
As Ms. Crabapple from The Simpson's once said, this is "pretty lame, Milhouse."
In terms of Computer Science, these compile-time features of C++ enable things like call-by-name parameter passing, which is known to be more powerful than call-by-value and call-by-reference.
Again, this is perhaps not the popular answer- any introductory C++ text will warn you off of #define, for example. But having worked with a wide variety of languages over many years, and having given consideration to the theory behind all of this, I think that many people are giving bad advice. This seems especially to be the case in the diluted sub-field known as "IT."
Passing POD structures across processes with minimum overhead. In other words, it allows us to easily handle blobs of binary data.
C# and Java force you to put your 'main()' function in a class. I find that weird, because it dilutes the meaning of a class.
To me, a class is a category of objects in your problem domain. A program is not such an object. So there should never be a class called 'Program' in your program. This would be equivalent to a mathematical proof using a symbol to notate itself -- the proof -- alongside symbols representing mathematical objects. It'll be just weird and inconsistent.
Fortunately, unlike C# and Java, C++ allows global functions. That lets your main() function to exist outside. Therefore C++ offers a simpler, more consistent and perhaps truer implementation of the the object-oriented idiom. Hence, this is one thing C++ can do, but C# and Java cannot.
I think that operator overloading is a quite nice feature. Of course it can be very much abused (like in Boost lambda).
Tight control over system resources (esp. memory) while offering powerful abstraction mechanisms optionally. The only language I know of that can come close to C++ in this regard is Ada.
C++ provides complete control over memory and as result a makes the the flow of program execution much more predictable.
Not only can you say precisely at what time allocations and deallocations of memory occurs, you can define you own heaps, have multiple heaps for different purposes and say precisely where in memory data is allocated to. This is frequently useful when programming on embedded/real time systems, such as games consoles, cell phones, mp3 players, etc..., which:
have strict upper limits on memory that is easy to reach (constrast with a PC which just gets slower as you run out of physical memory)
frequently have non homogeneous memory layout. You may want to allocate objects of one type in one piece of physical memory, and objects of another type in another piece.
have real time programming constraints. Unexpectedly calling the garbage collector at the wrong time can be disastrous.
AFAIK, C and C++ are the only sensible option for doing this kind of thing.
Well to be quite honest, you can do just about anything if your willing to write enough code.
So to answer your question, no, there is nothing you can't do in another language that C++ can't do. It's just how much patience do you have and are you willing to devote the long sleepless nights to get it to work?
There are things that C++ wrappers make it easy to do (because they can read the header files), like Office development. But again, it's because someone wrote lots of code to "wrap" it for you in an RCW or "Runtime Callable Wrapper"
EDIT: You also realize this is a loaded question.

Garbage collection Libraries in C++ [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
What free and commercial garbage collection libraries are available for C++, and what are the pros and cons of each?
I am interested in hard-won lessons from actual use in the field, not marketing or promotional blurb.
There is no need to elaborate on the usual trade offs associated with automatic garbage collection, but please do mention the algorithms used (reference counting, mark and sweep, incremental, etc.) and briefly summarise the consequences.
I have used the Boehm collector in the past with good success. It's open source and can be used in commercial software.
It's a conservative collector, and has a long history of development by one of the foremost researchers in garbage collection technology.
Boost has a great range of smart pointers which impliment reference counting or delete-on-scope exit or intrusive reference counting. These have proven enough for our needs. A big plus is that it is all free, open source, templated C++. because it is reference counting, in most cases it is highly deterministic when an object gets destroyed.
The Boehm garbage collector is freely available, and supposedly rather good (no first hand experience myself)
Theoretical paper (in PDF) about C++0x proposal for the Boehm garbage collector
It was originally said to make C++0x , but will not make it after all (due to time constraints I suppose).
Proprosal N2670 (minimal support for garbage collectors) did get approved in june 2008 though, so as compiler implementations pick up on this, and the standard gets finalised, the garbage collection world out there for C++ is sure to change...
I use boehm-gc a lot. It is straight-forward to use, but the documentation is really poor. There is a C++ page, but its quite hard to find.
Basically, you just make sure that every class inherits from their base class, and that you always pass gc_allocator to a container. In a number of cases you want to use libgccpp to catch other uses of new and delete. These are largely high-level changes, and we find that we can turn off the GC at compile-time using an #ifdef, and that supporting this only affects one or two files.
My major problem with it is that you can no longer use Valgrind, unless you turn the collector off first. While turning the collector off is easy to do, and doesn't require recompiling, it's obviously impossible to use it if you start to run out of memory.
The only one I know of is Boehm, which at the bottom is a traditional mark and sweep. It probably uses various techniques to optimize this, but typically incremental/generational/compacting GC's will be hard to create for C++ without going for a managed subset such as what you can get with .Net C++. Some of the approaches that needs to move pointers can be implemented with compiler support for pinning pointers or read/write blocks though, but the effect on performance may be too big, and it isn't necessarily non-trivial changes to the GC.
The major difficulty with GC's in C++ is the need to handle uncooperative modules, in the GC sense. ie, to deal with libraries that were never written with GC's in mind.
This is why the Boehm GC is often suggested.
You can also use Microsoft's Managed C++. The CLR and the GC are very solid and used in server products, but you have to use CLR types for the GC to actually collect - you can't just recompile your existing code and remove all the delete statements.
I would rather use C# to write brand new code, but Managed C++ lets you evolve your code base in a more progressive manner.
Read this and take a good look at the conclusions:
Conclusions
Complex solution to problem for which simple solutions are widely used and will be improved by C++0x leaving us little need.
We have little to no experience with the recommended language features which are to be standardized.
Fixing bad software complex system will never work.
Recommend minor language changes to improve future GC support - disallow hiding of pointers (xor list trick) as one example.
Finally - address the "C++ is bad because it has no GC" argument head-on. C++ doesn't generate garbage and so has no need for GC. Clearly Java, C#, Objective C, etc. generate lots of garbage.
Yes the last sentence is subjective and also a part of the holy wars.
I use C++ because I dislike the idea that someone needs to take out the garbage for me.
The city hall does that and that's enough for me.
If you need GC use another language. Pick the right tool for the right job.
Here's a commercial product I found in just looking for this same thing
http://www.harnixtechnologies.ca/hnxgc/
Back in the day, there was also a product called Great Circle from Geodesic Systems, but doesn't look like they sell that anymore. No idea if the sold the product to anyone else.