Can RAII efficiently share immutable objects between threads without synchronization

Can RAII efficiently share immutable objects between threads without synchronization - c++

In the countless arguments about the superiority of C++-style deterministic destruction (RAII) versus garbage-collection, proponents of the former often suggest that it can do everything garbage-collection can do. There is, however, a pattern which is used frequently in Java and .NET languages, but for which I know no nice pattern in C++. Consider the class [could be Java or C#; using public fields for brevity}
public class MaxItemIdentifier
{
public String maxItemName = null;
public long maxItemValue = -0x7FFFFFFFFFFFFFFF-1;
public void checkItem(String name, long value)
{
if (value > maxItemValue)
{
maxItemName = name;
maxItemValue = value;
}
}
}
In both Java and C#, the above method may safely be passed a string that was created on any thread. Further, while the method is not thread-safe, even improper threading usage will not jeopardize memory safety; maxItemName would still be guaranteed to either be null or identify one of the strings passed to checkItem. Additionally, the method never actually has to copy (or even look at) the contents of any string; all it acts upon are string references. Since no string object whose reference has been exposed to the outside world will ever be modified, a reference to a string may be considered synonymous with the sequence of characters identified thereby, and copying the reference is equivalent to copying the text.
Would there any way to write an equivalent class in C++ or similar RAII-based language which would guarantee memory safety regardless of threading usage, but would be not be needless inefficient when run from a single thread? The only approaches I'm aware of that such a method could be workable in C++ would be if either:
Whenever encountering an item whose "value" is larger than the previous maximum, the method copies the contents of the string; this would be slower than simply copying a reference. Further, I don't know how well this could maintain memory safety in the presence of improper threading usage.
Have the method receive a reference to a reference-counted pointer, and hold a variable of that type; when receiving a value which is larger than the previous maximum, atomically increment the reference count on the received pointer and atomically decrement the reference count on the previous maximum-item name; if the latter yields zero, release that name. This approach would seem safe, but on many platforms atomically increments and decrements would be excessively expensive for single-threaded usage, but would be necessary for memory safety in multi-threading scenarios.
Personally, I believe a good language/framework should support both RAII and GC, since each can handle very easily and efficiently some things which the other really can't handle at all. It's possible, though, that there would be some other approaches to handling such things in RAII that I'm unfamiliar with. Is there any way when using RAII to make a method like the above work efficiently when used in single-threading scenarios but also be usable in scenarios where a reference to a string created on one thread might then be exposed to other threads?
Note that unlike some other multi-threading RAII scenarios in which objects have a predictable lifetime, being consistently created in a producer thread and destroyed in a consumer thread (the subject of a related post), references to immutable objects like Strings are often shared without any reference-holder being identifiable as an "owner", and without any way of knowing if or when whether any particular reference-holder might overwrite the last surviving reference to a string.

This is certainly a known concern.
You can't get deterministic destruction of shared objects without some overhead. The reason that destruction comes out on top of finalization in so many cases is that:
Most objects are not shared.
Many shared objects require deterministic destruction, and the costs of providing that on top of finalization are even higher than the costs of using RAII to implement reference counting.
And clearly RAII does a great job of managing destruction.
It's really only objects that are shared among multiple unsynchronized users and will be abandoned and never used again where finalization outshines destruction. An example would be zero-copy multicast sockets (or at least O(1) copy).
The tradeoff is still between determinism (with some counting overhead) and non-determinism. Because C and C++ don't enforce a single resource management method, it's actually easier to mix destructor-based deterministic cleanup with efficient non-deterministic cleanup than it would be on top of the .NET or Java runtimes where everything undergoes non-deterministic deallocation.
An example of non-deterministic cleanup in the native world is the RCU method used in the Linux kernel. It's C code, but applies equally well to C++.
So even here, the advantage goes to RAII, you just use a different set of smart pointers for non-deterministic RCU than the ones for local scope deterministic release or reference counting, thread synchronization, and deterministic release.
Really, that's where your thinking went wrong. RAII is able to provide deterministic lifetime, but it is not limited to deterministic lifetime.

Related

How to avoid null-pointers in multi-threading on the same List?

Say, I have a vector of dynamic object pointers and have different threads working on those objects.
It is possible that while one thread is working on an object, the main thread is deleting it. It does this by setting a flag in the object to mark it for deletion and then starting to free up its memory.
I have thought about taking care of this by checking for the flag before each single access to the object, but theoretically the following could happen (example code for illustration, although I am trying to make it reflect the situation as best as possible there could still be errors in it):
object = copyPointerFromVector(someIndex);
if(!object->markedForDeletion){
---flag set, object cleaned up by main thread and erased from vector
object->getValues(something); //crash with access violation
}
While it is probably rare this of course is still unacceptable. As someone obviously very very rusty with multi-threading, what is the right way of solving this issue?

Note up front: I assume you know about synchronization (mutexes, condition variables, atomics etc), i.e. the primitive building blocks used for multithreaded programming and that your question is about how to use them. You need those basics.
The problem you have is basically one of unclear ownership, not one of synchronization. Of course, ownership between threads requires synchronization, so it is also involved. Still, when one part of your program is destroying a shared object while another part is still using it, it's because it wrongly assumes it was the sole owner and could dispose of the object. More generally, in multithreading, you could also say that it changes data structures without synchronization, but this case is special enough and there are according tools to deal with it.
The tools to deal with this are called reference counting and garbage collection. Of those, the easiest to apply is probably reference counting. For that, all you need is a smart pointer that keeps track of the number of owners of an object. For example, std::shared_ptr gives you exactly that and it manages the reference count in a thread-safe way. In order to "delete" an object from the mentioned vector, you just remove the smart pointer. If the refcount goes to zero, it was the last one and gets deleted for you.
Garbage collection is a bit more complex. It involves scanning your process memory for references/pointers to objects and deleting the objects which aren't referenced any more. This requires installing a garbage collector though and it's a more complex change to an existing program.

Feasibility of automatic cycle breaker for `std::shared_ptr`

C++11 introduced reference-counted smart pointers, std::shared_ptr. Being reference counted, these pointers are unable to automatically reclaim cyclic data structures. However, automatic collection of reference cycles was shown to be possible, for example by Python and PHP. To distinguish this technique from garbage collection, the rest of the question will refer to it as cycle breaking.
Given that there seem to be no proposals to add equivalent functionality to C++, is there a fundamental reason why a cycle breaker similar to the ones already deployed in other languages wouldn't work for std::shared_ptr?
Note that this question doesn't boil down to "why isn't there a GC for C++", which has been asked before. A C++ GC normally refers to a system that automatically manages all dynamically allocated objects, typically implemented using some form of Boehm's conservative collector. It has been pointed out that such a collector is not a good match for RAII. Since a garbage collector primarily manages memory, and might not even be called until there is a memory shortage, and C++ destructors manage other resources, relying on the GC to run destructors would introduce non-determinism at best and resource starvation at worst. It has also bee pointed out that a full-blown GC is largely unnecessary in the presence of the more explicit and predictable smart pointers.
However, a library-based cycle breaker for smart pointers (analogous to the one used by reference-counted interpreters) would have important differences from a general-purpose GC:
It only cares about objects managed through shared_ptr. Such objects already participate in shared ownership, and thus have to handle delayed destructor invocation, whose exact timing depends on ownership structure.
Due to its limited scope, a cycle breaker is unconcerned with patterns that break or slow down Boehm GC, such as pointer masking or huge opaque heap blocks that contain an occasional pointer.
It can be opt-in, like std::enable_shared_from_this. Objects that don't use it don't have to pay for the additional space in the control block to hold the cycle breaker metadata.
A cycle breaker doesn't require a comprehensive list of "root" objects, which is hard to obtain in C++. Unlike a mark-sweep GC which finds all live objects and discards the rest, a cycle breaker only traverses objects that can form cycles. In existing implementations, the type needs to provide help in the form of a function that enumerates references (direct or indirect) to other objects that can participate in a cycle.
It relies on regular "destroy when reference count drops to zero" semantics to destroy cyclic garbage. Once a cycle is identified, the objects that participate in it are requested to clear their strongly-held references, for example by calling reset(). This is enough to break the cycle and would automatically destroy the objects. Asking the objects to provide and clear its strongly-held references (on request) makes sure that the cycle breaker does not break encapsulation.
Lack of proposals for automatic cycle breaking indicates that the idea was rejected for practical or philosophical reasons. I am curious as what the reasons are. For completeness, here are some possible objections:
"It would introduce non-deterministic destruction of cyclic shared_ptr objects." If the programmer were in control of the cycle breaker's invocation, it would not be non-deterministic. Also, once invoked, the cycle breaker's behavior would be predictable - it would destroy all currently known cycles. This is akin to how shared_ptr destructor destroys the underlying object once its reference count drops to zero, despite the possibility of this causing a "non-deterministic" cascade of further destructions.
"A cycle breaker, just like any other form of garbage collection, would introduce pauses in program execution." Experience with runtimes that implement this feature shows that the pauses are minimal because the GC only handles cyclic garbage, and all other objects are reclaimed by reference counting. If the cycle detector is never invoked automatically, the cycle breaker's "pause" could be a predictable consequence of running it, similar to how destroying a large std::vector might run a large number of destructors. (In Python, the cyclic gc is run automatically, but there is API to disable it temporarily in code sections where it is not needed. Re-enabling the GC later will pick up all cyclic garbage created in the meantime.)
"A cycle breaker is unnecessary because cycles are not that frequent and they can be easily avoided using std::weak_ptr." Cycles in fact turn up easily in many simple data structures - e.g. a tree where children have a back-pointer to the parent, or a doubly-linked list. In some cases, cycles between heterogenous objects in complex systems are formed only occasionally with certain patterns of data and are hard to predict and avoid. In some cases it is far from obvious which pointer to replace with the weak variant.

There are a number of issues to be discussed here, so I've rewritten my post to better condense this information.
Automatic cycle detection
Your idea is to have a circle_ptr smart pointer (I know you want to add it to shared_ptr, but it's easier to talk about a new type to compare the two). The idea is that, if the type that the smart pointer is bound to derives from some cycle_detector_mixin, this activates automatic cycle detection.
This mixin also requires that the type implement an interface. It must provide the ability to enumerate all of the circle_ptr instances directly owned by that instance. And it must provide the means to invalidate one of them.
I submit that this is a highly impractical solution to this problem. It is excessively fragile and requires immense amounts of manual work from the user. And therefore, it is not appropriate for inclusion in the standard library. And here are some reasons why.
Determinism and cost
"It would introduce non-deterministic destruction of cyclic shared_ptr objects." Cycle detection only happens when a shared_ptr's reference count drops to zero, so the programmer is in control of when it happens. It would therefore not be non-deterministic. Its behavior would be predictable - it would destroy all currently known cycles from that pointer. This is akin to how shared_ptr destructor destroys the underlying object once its reference count drops to zero, despite the possibility of this causing a "non-deterministic" cascade of further destructions.
This is true, but not in a helpful way.
There is a substantial difference between the determinism of regular shared_ptr destruction and the determinism of what you suggest. Namely: shared_ptr is cheap.
shared_ptr's destructor does an atomic decrement, followed by a conditional test to see if the value was decremented to zero. If so, a destructor is called and memory is freed. That's it.
What you suggest makes this more complicated. Worst-case, every time a circle_ptr is destroyed, the code will have to walk through data structures to determine if there's a cycle. Most of the time, cycles won't exist. But it still has to look for them, just to make sure. And it must do so every single time you destroy a circle_ptr.
Python et. al. get around this problem because they are built into the language. They are able to see everything that's going on. And therefore, they can detect when a pointer is assigned at the time those assignments are made. In this way, such systems are constantly doing small amounts of work to build up cyclic chains. Once a reference goes away, it can look at its data structures and take action if that creates a cyclical chain.
But what you're suggesting is a library feature, not a language feature. And library types can't really do that. Or rather, they can, but only with help.
Remember: an instance of circle_ptr cannot know the subobject it is a member of. It cannot automatically transform a pointer to itself into a pointer to its owning class. And without that ability, it cannot update the data structures in the cycle_detector_mixin that owns it if it is reassigned.
Now, it could manually do this, but only with help from its owning instance. Which means that circle_ptr would need a set of constructors that are given a pointer to its owning instance, which derives from cycle_detector_mixin. And then, its operator= would be able to inform its owner that it has been updated. Obviously, the copy/move assignment would not copy/move the owning instance pointer.
Of course, this requires the owning instance to give a pointer to itself to every circle_ptr that it creates. In every constructor&function that creates circle_ptr instances. Within itself and any classes it owns which are not also managed by cycle_detection_mixin. Without fail. This creates a degree of fragility in the system; manual effort must be expended for each circle_ptr instance owned by a type.
This also requires that circle_ptr contain 3 pointer types: a pointer to the object you get from operator*, a pointer to the actual managed storage, and a pointer to that instance's owner. The reason that the instance must contain a pointer to its owner is that it is per-instance data, not information associated with the block itself. It is the instance of circle_ptr that needs to be able to tell its owner when it is rebound, so the instance needs that data.
And this must be static overhead. You can't know when a circle_ptr instance is within another type and when it isn't. So every circle_ptr, even those that don't use the cycle detection features, must bear this 3 pointer cost.
So not only does this require a large degree of fragility, it's also expensive, bloating the type's size by 50%. Replacing shared_ptr with this type (or more to the point, augmenting shared_ptr with this functionality) is just not viable.
On the plus side, you no longer need users who derive from cycle_detector_mixin to implement a way to fetch the list of circle_ptr instances. Instead, you have the class register itself with the circle_ptr instances. This allows circle_ptr instances that could be cyclic to talk directly to their owning cycle_detector_mixin.
So there's something.
Encapsulation and invariants
The need to be able to tell a class to invalidate one of its circle_ptr objects fundamentally changes the way the class can interact with any of its circle_ptr members.
An invariant is some state that a piece of code assumes is true because it should be logically impossible for it to be false. If you check that a const int variable is > 0, then you have established an invariant for later code that this value is positive.
Encapsulation exists to allow you to be able to build invariants within a class. Constructors alone can't do it, because external code could modify any values that the class stores. Encapsulation allows you to prevent external code from making such modifications. And therefore, you can develop invariants for various data stored by the class.
This is what encapsulation is for.
With a shared_ptr, it is possible to build an invariant around the existence of such a pointer. You can design your class so that the pointer is never null. And therefore, nobody has to check for it being null.
That's not the case with circle_ptr. If you implement the cycle_detector_mixin, then your code must be able to handle the case of any of those circle_ptr instances becoming null. Your destructor therefore cannot assume that they are valid, nor can any code that your destructor calls make that assumption.
Your class therefore cannot establish an invariant with the object pointed to by circle_ptr. At least, not if it's part of a cycle_detector_mixin with its associated registration and whatnot.
You can argue that your design does not technically break encapsulation, since the circle_ptr instances can still be private. But the class is willingly giving up encapsulation to the cycle detection system. And therefore, the class can no longer ensure certain kinds of invariants.
That sounds like breaking encapsulation to me.
Thread safety
In order to access a weak_ptr, the user must lock it. This returns a shared_ptr, which ensures that the object will remain alive (if it still was). Locking is an atomic operation, just like reference incrementing/decrementing. So this is all thread-safe.
circle_ptrs may not be very thread safe. It may be possible for a circle_ptr to become invalid from another thread, if the other thread released the last non-circular reference to it.
I'm not entirely sure about this. It may be that such circumstances only appear if you've already had a data race on the object's destruction, or are using a non-owning reference. But I'm not sure that your design can be thread safe.
Virulence factors
This idea is incredibly viral. Every other type where cyclic references can happen must implement this interface. It's not something you can put on one type. In order to get the benefits, every type that could participate in a cyclical reference must use it. Consistently and correctly.
If you try to make circle_ptr require that the object it manages implement cycle_detector_mixin, then you make it impossible to use such a pointer with any other type. It wouldn't be a replacement of (or augmentation for) shared_ptr. So there is no way for a compiler to help detect accidental misuse.
Sure, there are accidental misuses of make_shared_from_this that cannot be detected by compilers. However, that is not a viral construct. It is therefore only a problem for those who need this feature. By contrast, the only way to get a benefit from cycle_detector_mixin is to use it as comprehensively as possible.
Equally importantly, because this idea is so viral, you will be using it a lot. And therefore, you are far more likely to encounter the multiple-inheritance problem than users of make_shared_from_this. And that's not a minor issue. Especially since cycle_detector_mixin will likely use static_cast to access the derived class, so you won't be able to use virtual inheritance.
Summation
So here is what you must do, without fail, in order to detect cycles, none of which the compiler will verify:
Every class participating in a cycle must be derived from cycle_detector_mixin.
Anytime a cycle_detector_mixin-derived class constructs a circle_ptr instance within itself (either directly or indirectly, but not within a class that itself derives from cycle_detector_mixin), pass a pointer to yourself to that cycle_ptr.
Don't assume that any cycle_ptr subobject of a class is valid. Possibly even to the extent of becoming invalid within a member function thanks to threading issues.
And here are the costs:
Cycle-detecting data structures within cycle_detector_mixin.
Every cycle_ptr must be 50% bigger, even the ones that aren't used for cycle detection.
Misconceptions about ownership
Ultimately, I think this whole idea comes down to a misconception about what shared_ptr is actually for.
"A cycle detector is unnecessary because cycles are not that frequent and they can be easily avoided using std::weak_ptr." Cycles in fact turn up easily in many simple data structures - e.g. a tree where children have a back-pointer to the parent, or a doubly-linked list. In some cases, cycles between heterogenous objects in complex systems are formed only occasionally with certain patterns of data and are hard to predict and avoid. In some cases it is far from obvious which pointer to replace with the weak variant.
This is a very common argument for general-purpose GC. The problem with this argument is that it usually makes an assumption about the use of smart pointers that just isn't valid.
To use a shared_ptr means something. If a class stores a shared_ptr, that represents that the class has ownership of that object.
So explain this: why does a node in a linked list need to own both the next and previous nodes? Why does a child node in a tree need to own its parent node? Oh, they need to be able to reference the other nodes. But they do not need to control the lifetime of them.
For example, I would implement a tree node as an array of unique_ptr to their children, with a single pointer to the parent. A regular pointer, not a smart pointer. After all, if the tree is constructed correctly, the parent will own its children. So if a child node exists, it's parent node must exist; the child cannot exist without having a valid parent.
With a double linked list, I might have the left pointer be a unique_ptr, with the right being a regular pointer. Or vice-versa; one way is no better than the other.
Your mentality seems to be that we should always be using shared_ptr for things, and just let the automatic system work out how to deal with the problems. Whether it's circular references or whatever, just let the system figure it out.
That's not what shared_ptr is for. The goal of smart pointers is not that you don't think about ownership anymore; it's that you can express ownership relationships directly in code.
Overall
How is any of this an improvement over using weak_ptr to break cycles? Instead of recognizing when cycles might happen and doing extra work, you now do a bunch of extra work everywhere. Work that is exceedingly fraglile; if you do it wrong, you're no better off than if you missed a place where you should have used weak_ptr. Only it's worse, because you probably think your code is safe.
The illusion of safety is worse than no safety at all. At least the latter makes you careful.
Could you implement something like this? Possibly. Is it an appropriate type for the standard library? No. It's just too fragile. You must implement it correctly, at all times, in all ways, everywhere that cycles might appear... or you get nothing.
Authoritative references
There can be no authoritative references for something that was never proposed, suggested, or even imagined for standardization. Boost has no such type, and such constructs were never even considered for boost::shared_ptr. Even the very first smart pointer paper (PDF) never considered the possibility. The subject of expanding shared_ptr to automatically be able to handle cycles through some manual effort has never been discussed even on the standard proposal forums where far stupider ideas have been deliberated.
The closest to a reference I can provide is this paper from 1994 about a reference-counted smart pointer. This paper basically talks about making the equivalent of shared_ptr and weak_ptr part of the language (this was in the early days; they didn't even think it was possible to write a shared_ptr that allowed casting a shared_ptr<T> to a shared_ptr<U> when U is a base of T). But even so, it specifically says that cycles would not be collected. It doesn't spend much time on why not, but it does state this:
However, cycles of collected objects with clean-up
functions are problematic. If A and B are reachable from
each other, then destroying either one first will violate
the ordering guarantee, leaving a dangling pointer. If the
collector breaks the cycle arbitrarily, programmers would
have no real ordering guarantee, and subtle, time-dependent
bugs could result. To date, no one has devised a safe,
general solution to this problem [Hayes 92].
This is essentially the encapsulation/invariant issue I pointed out: making a pointer member of a type invalid breaks an invariant.
So basically, few people have even considered the possibility, and those few who did quickly discarded it as being impractical. If you truly believe that they're wrong, the single best way to prove it is by implementing it yourself. Then propose it for standardization.

std::weak_ptr is the solution to this problem. Your worry about
a tree where children have a back-pointer to the parent
can be solved by using raw pointers as the back-pointer. You have no worry of leakage if you think about it.
and your worry about
doubly-linked list
is solved by std::weak_ptr or a raw one.

I believe that the answer to your question is that, contrary to what you claim, there is no efficient way to automatically handle cyclic references. Checking for cycles must be carried out every time a "shared_ptr" is destroyed. On the other hand, introducing any deferring mechanism will inevitably result in a undetermined behavior.

The shared_ptr was not made for automatic reclamation of circular references. It existed in the boost library for some time before being copied to STL. It is a class that attaches a reference counter to any c++ object - be it an array, a class, or an int. It is a relatively lightweight and self-sufficient class. It does not know what it contains, with exception that it knows a deleter function to call when needed.
Al this cycle resolution requires too much heavy code. If you like GC, you can use another language, that was designed for GC from the beginning. Bolting it on via STL would look ugly. A language extension as in C++/CLI would be much nicer.

By reference counting what you ask for is impossible. In order to identify a circle one would have to hold identification of the references to your object. That is easy in memory managed languages since the virtual machine knows who references whom.
In c++ you can only do that by holding a list of references in the circular pointer of e.g. UUID that identifies the object referencing your resources. This would imply that the uuid is somehow passed into the structure when the object is acquired, or that the pointer has access to that resources internals.
These now become implementation specific, since you require a different pointer interface e.g copy and assignment could not be implemented as raw pointers, and demand from every platform to have a uuid source, which cannot be the case for every system. You could of course provide the memory address as a uuid .
Still to overcome the copy , and proper assignment without having a specialized assign method would probably require a single source that allocates references. This cannot be embedded in the language, but may be implemented for a specific application as global registry.
Apart from that, copying such a larger shared pointer would incurr larger performance impact, since during those operations on would have to make lookups for adding , removing, or resolving cycles. Since , doing cycle detection in a graph, from a complexity point of view, would require to traverse the graph registered and apply DFS with backtracking, which is at least proportional to the size of references, I don't see how all these do not scream GC.

Why garbage collection when RAII is available?

I hear talks of C++14 introducing a garbage collector in the C++ standard library itself.
What is the rationale behind this feature? Isn't this the reason that RAII exists in C++?
How will the presence of standard library garbage collector affect the RAII semantic?
How does it matter to me(the programmer) or the way in which I write C++ programs?

Garbage collection and RAII are useful in different contexts. The presence of GC should not affect your use of RAII. Since RAII is well-known, I give two examples where GC is handy.
Garbage collection would be a great help in implementing lock-free data structures.
[...] it turns out that deterministic memory freeing is quite a fundamental problem in lock-free data structures. (from Lock-Free Data Structures By Andrei Alexandrescu)
Basically the problem is that you have to make sure you are not deallocating the memory while a thread is reading it. That's where GC becomes handy: It can look at the threads and only do the deallocation when it is safe. Please read the article for details.
Just to be clear here: it doesn't mean that the WHOLE WORLD should be garbage collected as in Java; only the relevant data should be garbage collected accurately.
In one of his presentations, Bjarne Stroustrup also gave a good, valid example where GC becomes handy. Imagine an application written in C/C++, 10M SLOC in size. The application works reasonably well (fairly bug free) but it leaks. You neither have the resources (man hours) nor the functional knowledge to fix this. The source code is a somewhat messy legacy code. What do you do? I agree that it is perhaps the easiest and cheapest way to sweep the problem under the rug with GC.
As it has been pointed out by sasha.sochka, the garbage collector will be optional.
My personal concern is that people would start using GC like it is used in Java and would write sloppy code and garbage collect everything. (I have the impression that shared_ptr has already become the default 'go to' even in cases where unique_ptr or, hell, stack allocation would do it.)

I agree with #DeadMG that there is no GC in current C++ standard but I would like to add the following citation from B. Stroustrup:
When (not if) automatic garbage collection becomes part of C++, it
will be optional
So Bjarne is sure that it will be added in future. At least the chairman of the EWG (Evolution Working Group) and one of the most important committee members (and more importantly language creator) wants to add it.
Unless he changed his opinion we can expect it to be added and implemented in the future.

There are some algorithms which are complicated/inefficient/impossible to write without a GC. I suspect this is the major selling point for GC in C++, and can't ever see it being used as a general-purpose allocator.
Why not a general-purpose allocator?
First, We have RAII, and most (including me) seem to believe that this is a superior method of resource management. We like determinism because it makes writing robust, leak-free code a lot simpler and makes performance predictable.
Second, you'll need to place some very un-C++-like restrictions on how you can use memory. For instance, you'd need at least one reachable, un-obfuscated pointer. Obfuscated pointers, as are popular in common tree container libraries (using alignment-guaranteed low bits for color flags) among others, won't be recognizable by the GC.
Related to that, the things which make modern GCs so usable are going to be very difficult to apply to C++ if you support any number of obfuscated pointers. Generational defragmenting GCs are really cool, because allocating is extremely cheap (essentially just incrementing a pointer) and eventually your allocations get compacted into something smaller with improved locality. To do this, objects need to be movable.
To make an object safely movable, the GC needs to be able to update all the pointers to it. It won't be able to find obfuscated ones. This could be accomodated, but wouldn't be pretty (probably a gc_pin type or similar, used like current std::lock_guard, which is used whenever you need a raw pointer). Usability would be out the door.
Without making things movable, a GC would be significantly slower and less scalable than what you're used to elsewhere.
Usability reasons (resource management) and efficiency reasons (fast, movable allocations) out of the way, what else is GC good for? Certainly not general-purpose. Enter lock-free algorithms.
Why lock-free?
Lock-free algorithms work by letting an operation under contention go temporarily "out of sync" with the data structure and detecting/correcting this at a later step. One effect of this is that under contention memory might be accessed after it has been deleted. For example, if you have multiple threads competing to pop a node from a LIFO, it is possible for one thread to pop and delete the node before another thread has realized the node was already taken:
Thread A:
Get pointer to root node.
Get pointer to next node from root node.
Suspend
Thread B:
Get pointer to root node.
Suspend
Thread A:
Pop node. (replace root node pointer with next node pointer, if root node pointer hasn't changed since it was read.)
Delete node.
Suspend
Thread B:
Get pointer to next node from our pointer of root node, which is now "out of sync" and was just deleted so instead we crash.
With GC you can avoid the possibility of reading from uncommitted memory because the node would never be deleted while Thread B is referencing it. There are ways around this, such as hazard pointers or catching SEH exceptions on Windows, but these can hurt performance significantly. GC tends to be the most optimal solution here.

There isn't, because there isn't one. The only features C++ ever had for GC were introduced in C++11 and they're just marking memory, there's no collector required. Nor will there be in C++14.
There is no way in hell a collector could pass Committee, is my opinion.

None of the answers so far touch upon the most important benefit of adding garbage-collection to a language: In the absence of language-supported garbage-collection, it's almost impossible to guarantee that no object will be destroyed while references to it exist. Worse, if such a thing does happen, it's almost impossible to guarantee that a later attempt to use the reference won't end up manipulating some other random object.
Although there are many kinds of objects whose lifetimes can be much better managed by RAII than by a garbage collector, there's considerable value in having the GC manage nearly all objects, including those whose lifetime is controlled by RAII. An object's destructor should kill the object and make it useless, but leave the corpse behind for the GC. Any reference to the object will thus become a reference to the corpse, and will remain one until it (the reference) ceases to exist entirely. Only when all references to the corpse have ceased to exist will the corpse itself do so.
While there are ways of implementing garbage collectors without inherent language support, such implementations either require that the GC be informed any time references are created or destroyed (adding considerable hassle and overhead), or run the risk that a reference the GC doesn't know about might exist to an object which is otherwise unreferenced. Compiler support for GC eliminates both those problems.

GC has the following advantages:
It can handle circular references without programmer assistance (with RAII-style, you have to use weak_ptr to break circles). So a RAII style application can still "leak" if it is used improperly.
Creating/destroying tons of shared_ptr's to a given object can be expensive because refcount increment/decrement are atomic operations. In multi-threaded applications the memory locations which contains refcounts will be "hot" places, putting a lot of pressure on the memory subsystem. GC isn't prone to this specific issue, because it uses reachable sets instead of refcounts.
I am not saying that GC is the best/good choice. I am just saying that it has different characteristics. In some scenarios that might be an advantage.

Definitions:
RCB GC: Reference-Counting Based GC.
MSB GC: Mark-Sweep Based GC.
Quick Answer:
MSB GC should be added into the C++ standard, because it is more handy than RCB GC in certain cases.
Two illustrative examples:
Consider a global buffer whose initial size is small, and any thread can dynamically enlarge its size and keep the old contents accessible for other threads.
Implementation 1 (MSB GC Version):
int* g_buf = 0;
size_t g_current_buf_size = 1024;
void InitializeGlobalBuffer()
{
g_buf = gcnew int[g_current_buf_size];
}
int GetValueFromGlobalBuffer(size_t index)
{
return g_buf[index];
}
void EnlargeGlobalBufferSize(size_t new_size)
{
if (new_size > g_current_buf_size)
{
auto tmp_buf = gcnew int[new_size];
memcpy(tmp_buf, g_buf, g_current_buf_size * sizeof(int));
std::swap(tmp_buf, g_buf);
}
}
Implementation 2 (RCB GC Version):
std::shared_ptr<int> g_buf;
size_t g_current_buf_size = 1024;
std::shared_ptr<int> NewBuffer(size_t size)
{
return std::shared_ptr<int>(new int[size], []( int *p ) { delete[] p; });
}
void InitializeGlobalBuffer()
{
g_buf = NewBuffer(g_current_buf_size);
}
int GetValueFromGlobalBuffer(size_t index)
{
return g_buf[index];
}
void EnlargeGlobalBufferSize(size_t new_size)
{
if (new_size > g_current_buf_size)
{
auto tmp_buf = NewBuffer(new_size);
memcpy(tmp_buf, g_buf, g_current_buf_size * sizeof(int));
std::swap(tmp_buf, g_buf);
//
// Now tmp_buf owns the old g_buf, when tmp_buf is destructed,
// the old g_buf will also be deleted.
//
}
}
PLEASE NOTE:
After calling std::swap(tmp_buf, g_buf);, tmp_buf owns the old g_buf. When tmp_buf is destructed, the old g_buf will also be deleted.
If another thread is calling GetValueFromGlobalBuffer(index); to fetch the value from the old g_buf, then A Race Hazard Will Occur!!!
So, though implementation 2 looks as elegant as implementation 1, it doesn't work!
If we want to make implementation 2 work correctly, we must add some kind of lock-mechanism; then it will be not only slower, but less elegant than implementaion 1.
Conclusion:
It is good to take MSB GC into the C++ standard as an optional feature.

Is boost::interprocess::shared_ptr threadsafe (and interprocess-safe)?

I want to share data between threads, and have it automatically deleted when the last user is done with it. This seems to work, most of the time, using boost::interprocess::shared_ptr in a boost::fixed_managed_shared_memory segment: but not always.
So, is boost::interprocess::shared_ptr thread (and interprocess) -safe?
If I'm using my shared memory at a fixed address (I'm pretty certain this is going to be okay in my 64-bit (well, 48-bit) address space), is it possible to use a normal boost::shared_ptr (which are threadsafe) instead?
some clarification:
The pointer type I use is plain void*, (my shared memory is mapped to a fixed address).
The question of threadsafety is about the reference count -- i.e., whether copying/destroying shared pointers to the same thing in different processes at the same time is permitted. Not access to the same shared pointer in different threads, and not access to the pointee.

The reference count used in boost::interprocess:shared_ptr is implemented using an atomic counter defined in boost/interprocess/detail/atomic.hpp with the refcount logic mainly implemented by boost/interprocess/smart_ptr/detail/sp_counted_base_atomic.hpp. The intent is to have the refcount be handled in a thread (and interprocess) safe manner.
The atomic operation implementations differ depending on the specific target platform (Windows uses the Win32 Interlocked APIs, some platforms use various inline assembly, etc). It might be helpful to know what platform you're targeting. I suppose that you may be running into a bug in the refcount handling, though I wouldn't count on it.
I've restricted the above answer to the area you wanted specifically addressed:
The question of threadsafety is about the reference count -- i.e., whether copying/destroying shared pointers to the same thing in different processes at the same time is permitted. Not access to the same shared pointer in different threads, and not access to the pointee.
That said, I'd look at bugs introduced possibly by the items you mention above or by somehow creating 'independent' boost::interprocess:shared_ptr objects (where different shared_ptrs refer to the same object using different refcounts). This situation can be easy to have happen if you have some code that continues to use and/or pass around the raw object pointer.

boost::shared_ptr<T> is not interprocess safe, so whether it is multithread safe in this context is moot. (This statement assumes that BOOST_SP_DISABLE_THREADS has not been #defined for the program's operation.)
boost::interprocess::shared_ptr<T> is, in its very nature, designed to be cross-process safe, as well as multithread safe in its nature. When the last reference goes out of scope, the pointed-at object can be cleaned up. Obviously, this cleaning up happens within the bounds of the shared memory segment used for the object.
Since boost::shared_ptr<T> uses a lock-free counting mechanism at version 1.33.0 on many platforms, it is unlikely except by the remotest of chances that cross-process deletion of an object in a shared_memory segment would succeed, and does not appear to be supported functionality by the Boost maintainers.

Er. boost::shared_ptr is most definitely not thread-safe. At least not more thread-safe than e.g. std::vector. You may read a boost::shared_ptr from multiple threads, but as soon as any thread is writing a boost::shared_ptr it must synchronize with other writers and readers.
And no, you can not use it in shared memory, it was never designed to be. E.g. it uses a so called "shared count" object that stores the reference-count and the deleter, and that shared-count object is allocated by the shared_ptr code, so it will not reside in shared memory. Also the shared_ptr code (and meta data like vtables) might be at totally different addresses in different processes, so any virtual function call would also be a problem (and IIRC shared_ptr uses virtual functions internally - or at least function pointers, which leads to the same problem).
I don't know if boost::interprocess::shared_ptr is interprocess-safe, but I'm pretty sure it's not. Interprocess synchronization is pretty expensive. Having boost::interprocess::shared_ptr not do it makes it possible for the user to block accesses to shared data. That way the high synchronization cost only has to be paid once for multiple accesses in a row.
EDIT: I would expect that the usage pattern that Eamon Nerbonne refered to in his comment (which is thread-safe with boost::shared_ptr), is also OK with boost::interprocess::shared_ptr. Can't say for sure though.

"This seems to work, most of the time, using boost::interprocess::shared_ptr in a boost::fixed_managed_shared_memory segment: but not always."
If not always means that deletion don't work always:
Just use a semaphore with your thread safe container. This semaphore is not improve thread safety, but you can verify and even limit how many user use the data. If semaphore is 0, then no more user, safe delete the shared data.
If only one user there, this will be 1, so copy out the user-requested data, delete the shared container, then return with the copy.

Looking through the code in shared_ptr.hpp, and at the docs on the boost website, it would seem as though dereferencing a single instance may or may not be threadsafe depending on the second template parameter, which determines the internal pointer type to be used. Specifically,
"The internal pointer will be of the same pointer type as typename VoidAllocator::pointer type (that is, if typename VoidAllocator::pointer is offset_ptr, the internal pointer will be offset_ptr)."
And since dereferences merely return the result of get()/get_pointer() method of this class, it should probably depend entirely on that. Boost::shared_ptr will work if you want simultaneous read-only access. For write access from multiple threads, you might have to write your own wrapper modelled after offset_ptr.

As pgroke alludes to (not sure why the downvotes) the core question is whether you are accessing the same shared_ptr instance from different threads or processes.
shared_ptr (interprocess or not) does not support this scenario, and this will not be safe.
On the other hand, shared_ptr is designed to have multiple (thread-private, or protected from concurrent modification via some other mechanism) shared pointer instances point to the same object, and have different instances of these pointers to the same object be modified concurrently without issue.
::interprocess:: here is mostly a red-herring - it doesn't change the thread-safety of the pointer, just makes sure there are no internal pointers that refer to process-private memory, etc.
So which of the two cases is it?

Is there a language with RAII + Ref counting that does not have unsafe pointer arithmetric?

RAII = Resource Acquisition is Initialization
Ref Counting = "poor man's GC"
Together, they are quite powerful (like a ref-counted 3D object holding a VBO, which it throws frees when it's destructor is called).
Now, question is -- does RAII exist in any langauge besides C++? In particular, a language that does not allow pointer arithmetric / buffer overflows?

D has RAII but still has pointer arithmetic :( BUT, you don't really have to use it. Please note getting D to work was a pain in the butt for me so IM JUST SAYING.

While not exactly RAII, Python has the with statement and C# has the using statement.

Perl 5 has ref counting and destructors that are guaranteed to be called when all references fall out of scope, so RAII is available in the language, although most Perl programmers don't use the term.
And Perl 5 does not expose raw pointers to Perl code.
Perl 6, however, has a real garbage collector, and in fact allows the garbage collector to be switched out; so you can't rely on things being collected in any particular order.
I believe Python and Lua use reference counting.

perl, python (C), php, and tcl are reference counted and have mechanisms to destroy an object once its reference count goes to zero, which can happen as soon as a variable goes out of scope. built-in types are released automatically. user-defined classes have a way to define a destructor that will get called upon release.
there are some edge cases: global variables might not be released until the end and circular references may not be released until the end (though php has recently implemented a gc that handles this case and python 2 added a cycle detector).

Python (the standard CPython, not variants like Jython, Unladen Swallow and IronPython) uses reference counting for its objects.
With that, it also has RAII and (mostly) deterministic garbage collection. For example, this is supposed to work deterministically closing files:
def a():
fp = open('/my/file', 'r')
return fp.read()
Note fp.close() is never called. As soon as fp goes out of scope, the object should be destroyed. However, there are some cases where deterministic finalization is not guaranteed, such as in:
Something throws an exception and the traceback is currently being handled, or a reference is kept to it (note sys.last_traceback keeps the last traceback)
Cyclic references exist in an object, causing the reference count to not go to zero
Therefore, while python theoretically has deterministic finalization, it is better to explicitly close any resources where it's possible an exception (like IOError or the like) could cause the object to remain live.

Vala's memory management of objects is based on reference counting, and it has RAII (in the sense that it's destructors are called deterministically). The typical use case is to create GUIs, where the overhead from refcounting is often negligible. You can use pointers and bypass the refcounting, for example for interoperability, or if you need the extra performance, but in most cases you can live without pointers. It also does something clever, you can mark references as owned or unowned and transfer ownership, and in many cases it is able to elide the reference counting (if an object does not escape a function, for example). Vala is closely connected to GObject/GTK, so it only makes sense to use if you want to work in that ecosystem.
Another interesting candidate would be Rust. While it also has pointers and garbage collection, both are optional. You can write programs completely with an equivalent of C++'s smart pointers, with guaranteed no leaks, and it supports RAII. It also has a concept of reference ownership, like Vala, but a bit more complex. Essentially, Rust gives you complete control over how you manage memory. You can work at the bare metal level, and could even write a kernel in it, or you can work at a high level with a GC, or anything in between, and most of the time it protects you from leaking memory or other pointer-related bugs. The downside is that it is pretty complex, and since it is still in development things might change.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js