LLVM disable the stack - llvm

When compiling a language which uses heap allocated closures holding by-val or pointers to heap allocated arguments, how might one guarantee that LLVM doesn't generate code which uses the stack (and optionally use ebp/esp as another general-purpose register)?
The function may be running in a strand/microthread which doesn't have a C/traditional stack at all.
Edit
An article that might be relevant, but it doesn't say whether there is a guarantee LLVM won't try and consume stack.
http://nondot.org/sabre/LLVMNotes/ExplicitlyManagedStackFrames.txt
Edit 2
To be clear, I don't need the stack for regular calls because I can generate new closures with pointers to old closures so that the state is proper on returning from a call.
Think of this as just a linked stack.

You would have to code your own backend. The existing x86/x64 backends will not be programmed to support this- stack use is basically mandatory for virtually all functions. You could lower alloca() instructions to your own pseudo-stack which is really on the heap. Not to mention you'd have to develop your own call stack, your own function calling ABIs, all of that nasty stuff. Even then, you're basically down to emulating a stack.
The only programming environments that do not provide a hardware stack and are still even somewhat useful are GPUs, realistically, and that's only because they offer so many available registers and each function is supposed to be very restricted. Only having register space available would cripple programs running on x86 or x64.

The Abstract Machine described by the C++ standard specification clearly describes a stack model. It's used in describing the behaviour of any (non-static) variables with automatic storage. (Member objects might be on the stack if the enclosing object is).
While it's technically possible I know of not a single C++ compiler that compiles the code in such a way that the stack is no,t (dynamically) used. (The compile should be emulating the stack in other memory (the heap?) - which would make it much less efficient than using the CPU infrastructure that is there. This makes it unlikely that any compiler would ever facilitate this kind of compilation mode.
Now, answering a potential X/Y part of the question, I assume you are looking for a way to implement stackless coroutines: Boost Asio provides just that: Stackless Coroutines.
Actually the actor-oriented design of Boost Asio knows about "strands" and you can alternatively use Stackless/Stackful coroutines.
I think you might be most intested in the HTTP Server Sample using Stackless Coroutines, here's a teaser (server.cpp, comments removed for brevity):
reenter(this) {
do {
socket_.reset(new tcp::socket(acceptor_->get_io_service()));
yield acceptor_->async_accept(*socket_, *this);
fork server(*this)();
} while(is_parent());
buffer_.reset(new boost::array<char, 8192>);
request_.reset(new request);
do {
yield socket_->async_read_some(boost::asio::buffer(*buffer_), *this);
boost::tie(valid_request_, boost::tuples::ignore)
= request_parser_.parse(*request_,
buffer_->data(), buffer_->data() + length);
} while(boost::indeterminate(valid_request_));
reply_.reset(new reply);
if(valid_request_)
request_handler_(*request_, *reply_);
else
*reply_ = reply::stock_reply(reply::bad_request);
yield boost::asio::async_write(*socket_, reply_->to_buffers(), *this);
socket_->shutdown(tcp::socket::shutdown_both, ec);
}
For completeness, there's also Boost Coroutine (which builds on Boost Context) for Stackful Coroutines. These are significantly more heavyweight, as the library will actively save and restore register/stack contents on context switch.

Related

How to prevent a specific function or method from being called from within an interrupt routine?

A hurdle in using C++ (instead of handling C++ as "C with a few extra features") in low-level embedded development is caused by interrupts. Often there are cases where time must be measured in individual CPU cycles, and some classes have no business being used there. Implicit conversions, copy constructors and similar things can cause issues when used within interrupts, especially if they happen behind the scenes. On the other hand, some carefully designed classes could be meaningfully used in interrupts.
This raises a question: is there an elegant way to signal the compiler that a specific class, function, or variable (or just code lines) should throw a compile error when used inside an interrupt? (to not have to rely so heavily on comments stating //Never call this from an interrupt!)
(If not, would it make sense to try getting something like this into the next C++ standard? I'm thinking of something like how the constexpr specifier informs the compiler what a function or variable can or cannot be used for. Would a nointerrupt specifier make sense, or does something already exist that provides that functionality?)
Like the main system context (and task functions, if you are using an OS), interrupt service routines are just other contexts. The question which code to run (methods of which subset of classes) is orthogonal to the technique of the (class) libraries itself.
RTOS libraries often impose "limits" which APIs may be called in ISRs (or, in ISRs with a given priority) - but these limits are "only" required to maintain data consistency, and not for HW technical reasons imposed by the µC infrastructure.
You want to limit the use of the existing class libraries to certain context(s) because some class implementations consume more resources (time, stack memory space?) than you can afford in ISRs, but this is a matter of your deployment decisions in SW project architecture.
Therefore, the most elegant solution from my point of view would be to go back to SW architecture level, create a neat piece of documentation (maybe a UML deployment diagram may already help you quite a lot), and to specify in which contexts you want which code to be executed. You will probably find that in ISR contexts, only a tiny fraction of your code is to be executed, so it may be feasible to mark the subset of classes (or, subsets of methods of classes) that are supposed to run in ISR contexts from the top-down perspective. You can backup this procedure by starting/finishing every ISR by setting/resetting an ISR-specific flag within a dedicated shared memory area, and to add assert statements to the critical classes/methods that detect if the particular code is run in an unwanted ISR context.

Performance vs. C++ memory model

With the new shared-memory concurrency features of C++11 it is possible that two threads can allocate memory at the same time. Furthermore, since the compiler does not know in advance if the compiled code will be run by multiple threads at the same time it has to assume for the worst. Thus, my conception would be that the compiled code has to synchronize trips to the heap in some way. This would then slow down single threaded code which does not need synchronization.
Wouldn't this be in contrast to the C++ dictum that "you only pay for what you use"? Is the overhead so small that it was not considered important? Are the other areas where the C++ memory model slows down code which in the end is only used single-threadedly?
Heap managers indeed need to synchronize, and that's a possible performance problem for multi-threaded code. It's up to the program to mitigate that if necessary. Standard libraries are also reacting, trying to get in better multi-threaded allocators.
Edit: Some thoughts about the questions in the second paragraph.
Even C++ needs to be sufficiently safe to be usable. "YDPFWYU" is nice, but if it means that you have to wrap a mutex around every allocation if you want to use the code in a multi-threaded environment, you have a big problem. It's like exceptions, really: even code that doesn't actively use them should be somewhat aware that it might be used in a context where they exist, and both the programmer and the compiler need to be aware of that. The compiler needs to create exception support code/data structures, while the programmer needs to write exception-safe code. Multi-threading is the same, only worse: any piece of code you write might be used in a multi-threaded environment, so you need to write thread-safe code, and the compiler/environment needs to be aware of threading (forgo some very unsafe optimizations, and have a thread-safe allocator).
These are the points in C++ where you pay even for what you don't use, as far as the standard is concerned. Your particular compiler might give you an escape hatch (disable exceptions, use single-threaded runtime library), but that's no longer real C++ then.
That said, even (or especially) if you have a single global allocator lock, the overhead for a single-threaded program is minimal: locks are only expensive when under contention. An uncontested mutex lock/unlock is not very significant compared to the rest of the allocator operation.
Under contention, the story is different, which is where custom allocators possibly come in.
As I briefly mentioned above, one other place in C++ is slowed down very slightly by the mere existence of multi-threading: the prohibition on some particular optimizations. The compiler cannot invent reads and writes (especially writes) to possibly shared variables (like globals or things you handed out a pointer to) in code paths that wouldn't ordinarily have these accesses. This may slow down very specific pieces of code, but overall in a program, it's very unlikely that you'll notice.
You are mixing allocation and access of heap memory.
Multi-threaded heap allocation is indeed synchronized, but at a C library level, at least in all modern (con)current OSes' C libraries. There may be specific-purpose C libraries that don't do this. See for example the old single- and multithreaded C runtime library for MSVC (note how new versions of MSVS deprecate and even remove single-threaded variants). I assume glibc has a similar mechanism, and is probably also solely multithreaded, and so always synchronized. I haven't heard anyone complain about multithreaded memory allocation speeds, so if you have a concrete complaint, I'd like to see it properly explained and documented with reproducible code.
Access of heap memory (i.e. after a call to new or malloc has returned) is not protected by any mechanism whatsoever. C++11 gives you mutex and other synchronization possibilities that you, as a user, need to implement in your code if you want to protect from race conditions. If you do not, no purrformance is lost.
The compilers really not forced to be NOT optimizing. Always were possible to made a very bad compilers and "standard" libraries. And nowadays it is nothing else but bad quality. Despite it may be advertised as the "only real right C++".
"any piece of code you write might be used in a multi-threaded environment, so you need to write thread-safe code, and the compiler/environment needs to be aware of threading" - it is a clear stupidness.
Good implementation always can provide a normal way to optimize the single threaded code (and necessary libraries...), and the code, which not using exceptions, and allow the other features...
(For example, threading requires some certain functions to coordinate threads and also to create threads, and at link time the use of them are visible and may affect toolchain... Or at first call to thread creation function it may affect the memory allocation method (and have other effects) And there may be other good ways, like special switches to compiler and etc...)
Not really. Both physical memory and backing store are system resources on modern operating systems. So allocations of them and accesses to them have to be properly synchronized.
The case of threads sharing virtual memory is just a special case of the many other ways scheduling entities can share virtual memory. Consider two processes that memory map the same library or data file.
The only extra overhead with threads is modifications to the virtual memory map because threads share a virtual memory map. Much of the synchronization overhead is unavoidable. For example, if you're unmapping something, some resource typically has to be returned to a system-level pool, and that requires synchronization anyway.
On many platforms, some special platform-specific thing is needed to let other threads running at the same time know that their view of virtual memory has changed. But this disappears anyway if there are no other threads since there would be nothing to notify.
It is simply a reality that some features have costs even if they're not being used. The existence of swap logic and checks in your kernel has some costs even if you never swap. Engineers are realists and have to balance costs and benefits.
Both physical memory and backing store are system resources on modern operating systems. So allocations of them and accesses to them have to be properly synchronized.
The case of threads sharing virtual memory is just a special case of the many other ways scheduling entities can share virtual memory.
As soon it is features of Operating System, there are no need to use additional code in the C/C++ allocation functions in application program (really, of course, multi threading needs special additional synchronization in Standard Lib. and additional "system calls" and see the question at beginning).
Real trouble may be having many types (single- and multi - threading) of the same library (Standard C/C++ library and also others) in the system... But...

Mixed-Mode Process vs. Managed-to-Unmanaged IPC

I am trying to come up with design candidates for a current project that I am working on. Its client interface is based on WCF Services exposing public methods and call backs. Requests are routed all the way to C++ libraries (that use boost) that perform calculations, operations, etc.
The current scheme is based on a WCF Service talking to a separate native C++ process via IPC.
To make things a little simpler, there is a recommendation around here to go mixed-mode (i.e. to have a single .NET process which loads the native C++ layer inside it, most likely communicating to it via a very thin C++/CLI layer). The main concern is whether garbage collection or other .NET aspects would hinder the performance of the unmanaged C++ part of the process.
I started looking up concepts of safe points and and GC helper methods (e.g. KeepAlive(), etc.) but I couldn't find any direct discussion about this or benchmarks. From what I understand so far, one of the safe points is if a thread is executing unamanged code and in this case garbage collection does not suspend any threads (is this correct?) to perform the cleanup.
I guess the main question I have is there a performance concern on the native side when running these two types of code in the same process vs. having separate processes.
If you have a thread that has never executed any managed code, it will not be frozen during .NET garbage collection.
If a thread which uses managed code is currently running in native code, the garbage collector won't freeze it, but instead mark the thread to stop when it next reaches managed code. However, if you're thinking of a native dispatch loop that doesn't return for a long time, you may find that you're blocking the garbage collector (or leaving stuff pinned causing slow GC and fragmentation). So I recommend keeping your threads performing significant tasks in native code completely pure.
Making sure that the compiler isn't silently generating MSIL for some standard C++ code (thereby making it execute as managed code) is a bit tricky. But in the end you can accomplish this with careful use of #pragma managed(push, off).
It is very easy to get a mixed mode application up and running, however it can be very hard to get it working well.
I would advise thinking carefully before choosing that design - in particular about how you layer your application and the sort of lifetimes you expect for your unmanaged objects. A few thoughts from past experiences:
C++ object lifetime - by architecture.
Use C++ objects briefly in local scope then dispose of them immediately.
It sounds obvious but worth stating, C++ objects are unmanaged resources that are designed to be used as unmanaged resources. Typically they expect deterministic creation and destruction - often making extensive use of RAII. This can be very awkward to control from a a managed program. The IDispose pattern exists to try and solve this. This can work well for short lived objects but is rather tedious and difficult to get right for long lived objects. In particular if you start making unmanaged objects members of managed classes rather than things that live in function scope only, very quickly every class in your program has to be IDisposable and suddenly managed programming becomes harder than ummanaged programming.
The GC is too aggressive.
Always worth remembering that when we talk about managed objects going out of scope we mean in the eyes of the IL compiler/runtime not the language that you are reading the code in. If an ummanaged object is kept around as a member and a managed object is designed to delete it things can get complicated. If your dispose pattern is not complete from top to bottom of your program the GC can get rather aggressive. Say for example you try to write a managed class which deletes an unmanaged object in its finaliser. Say the last thing you do with the managed object is access the unmanaged pointer to call a method. Then the GC may decide that during that unmanaged call is a great time to collect the managed object. Suddenly your unmanaged pointer is deleted mid method call.
The GC is not aggressive enough.
If you are working within address constraints (e.g. you need a 32 bit version) then you need to remember that the GC holds on to memory unless it thinks it needs to let go. Its only input to these thoughts is the managed world. If the unmanaged allocator needs space there is no connection to the GC. An unmanaged allocation can fail simply because the GC hasn't collected objects that are long out of scope. There is a memory pressure API but again it is only really usable/useful for quite simple designs.
Buffer copying. You also need to think about where to allocate any large memory blocks. Managed blocks can be pinned to look like unmanaged blocks. Unmanaged blocks can only ever be copied if they need to look like managed blocks. However when will that large managed block actually get released?

How to set the stacksize with C++11 std::thread

I've been trying to familiarize myself with the std::thread library in C++11, and have arrived at a stumbling block.
Initially I come from a posix threads background, and was wondering how does one setup the stack size of the std::thread prior to construction, as I can't seem to find any references to performing such a task.
Using pthreads setting the stack size is done like this:
void* foo(void* arg);
.
.
.
.
pthread_attr_t attribute;
pthread_t thread;
pthread_attr_init(&attribute);
pthread_attr_setstacksize(&attribute,1024);
pthread_create(&thread,&attribute,foo,0);
pthread_join(thread,0);
Is there something similar when using std::thread?
I've been using the following reference:
http://en.cppreference.com/w/cpp/thread
Initially I come from a posix threads background, and was wondering how does one setup the stack size of the std::thread prior to construction, as I can't seem to find any references to performing such a task.
You can't. std::thread doesn't support this because std::thread is standardized, and C++ does not require that a machine even has a stack, much less a fixed-size one.
pthreads are more restrictive in terms of the hardware that they support, and it assumes that there is some fixed stack size per thread. (So you can configure this)
As Loki Astari already said, it is extremely rare to actually need a non-default stack-size and usually either a mistake or the result of bad coding.
If you feel like the default stack size is too big for your needs and want to reduce it, just forget about it. Every modern OS now uses virtual memory / on-demand commit, which means that memory is only reserved, not actually allocated, until you access the pages. Reducing the stack size will not reduce your actual memory footprint.
Due to this very behaviour, OSes can afford to set the default stack size to very big values. E.g. on a vanilla Debian this is 8MB (ulimit -s) which should be enough for every need. If you still manage to hit that limit, my first idea would be that your code is wrong, so you should first and foremost review it and move things to the heap, transform recursive functions into loops, etc.
If despite all of this you really really need to change the stack size (i.e. increase it, since reducing it is useless), on POSIX you can always use setrlimit at the start of your program to increase the default stack size. Sure this will affect all threads, but only the ones who need it will actually use the additional memory.
Last but not least, in all fairness I can see a corner case where reducing the stack size would make sense: if you have tons of threads on a 32 bits system, they could eat up your virtual address space (again, not the actual memory consumption) up to the point that you don't have enough address space available for the heap. Again, setrlimit is your friend here even though I'd advise to move to a 64 bits system to benefit from the larger virtual address space (and if your program is that big anyway, you'll probably benefit from the additional RAM too).
I have also been investigating this issue. For some applications, the default stack size is not adequate. Examples: the program does deep recursion dependent on the specific problem it is solving; the program needs to create many threads and memory consumption is an issue.
Here is a summary of (partial) solutions / workarounds I found:
g++ supports a -fsplit-stack option on Linux. See for more information about Split stacks. Here is summary from their website:
The goal of split stacks is to permit a discontiguous stack which is
grown automatically as needed. This means that you can run multiple
threads, each starting with a small stack, and have the stack grow and
shrink as required by the program.
Remark: -fsplit-stack only worked for me after I started using the gold linker.
It seems clang++ will also support this flag. The version I tried (clang++ 3.3) crashed when trying to compile my application using the flag -fsplit-stack.
On Linux, set the stack size by executing ulimit -s <size> before starting your application. size is the stack size in Kbs. Remark: the command unlimit -s unlimited did not affect the size of threads created with std::thread. When I used ulimit -s unlimited, the main thread could grow, but the threads created with std::thread had the default size.
On Windows using Visual Studio, we can use use the linker /STACK parameter or /STACKSIZE in the module definition file, this is the default size for all created threads. See this link for more information. We can also modify this parameter in any executable using the command line tool EDITBIN.
On Windows using mingw g++, we can use the option -Wl,--stack,<size>. For some reason, when using cygwin g++, this flag only affects the size of the main thread.
Approaches that did not work for me:
ulimit -s <size> on OSX. It affects only the size of the main thread. Moreover, the Mac OSX default for a pthread stack size is 512kB.
setrlimit only affects the size of the main thread on Linux and OSX. On cygwin, it never worked for me, it seems it always returns an error.
For OSX, the only alternative seems to use boost::thread instead of std::thread, but this is not nice if we want to stick with the standard. I hope g++ and clang++ will also support -fsplit-stack on OSX in the future.
I found this in Scott Meyers book Overview of the New C++(C++0x), as it's quite long I can't post it as a comment, is this helpful?
There is also a standard API for getting at the platform-specific
handles behind threads, mutexes, condition variables, etc.. These
handles are assumed to be the mechanism for setting thread priorities,
setting stack sizes, etc. (Regarding setting stack sizes, Anthony
Williams notes: "Of those OSs that support setting the stack size,
they all do it differently. If you're coding for a specify platform
(such that use of the native_handle would be OK), then you could use
that platform's facilities to switch stacks. e.g. on POSIX you could
use makecontext and swapcontext along with explicit allocation of a
stack, and on Windows you could use Fibers. You could then use the
platform-specific facilities (e.g. Linker flags) to set the default
stack size to something really tiny, and then switch stacks to
something bigger where necessary.“)
Was looking for the answer to this myself just now.
It appears that while std::thread does not support this, boost::thread does.
In particular, you can use boost::thread::attributes to accomplish this:
boost::thread::attributes attrs;
attrs.set_stack_size(4096*10);
boost::thread myThread(attrs, fooFunction, 42);
You can do some modifications like this if you don't want to include a big library.
It is still dependend C++ compiler STL library. (Clang / MSVC now)
HackingSTL Library
std::thread thread = std::stacking_thread(65536, []{
printf("Hello, world!\n");
});

Garbage Collection in C++ -- why?

I keep hearing people complaining that C++ doesn't have garbage collection. I also hear that the C++ Standards Committee is looking at adding it to the language. I'm afraid I just don't see the point to it... using RAII with smart pointers eliminates the need for it, right?
My only experience with garbage collection was on a couple of cheap eighties home computers, where it meant that the system would freeze up for a few seconds every so often. I'm sure it has improved since then, but as you can guess, that didn't leave me with a high opinion of it.
What advantages could garbage collection offer an experienced C++ developer?
I keep hearing people complaining that C++ doesn't have garbage collection.
I am so sorry for them. Seriously.
C++ has RAII, and I always complain to find no RAII (or a castrated RAII) in Garbage Collected languages.
What advantages could garbage collection offer an experienced C++ developer?
Another tool.
Matt J wrote it quite right in his post (Garbage Collection in C++ -- why?): We don't need C++ features as most of them could be coded in C, and we don't need C features as most of them could coded in Assembly, etc.. C++ must evolve.
As a developer: I don't care about GC. I tried both RAII and GC, and I find RAII vastly superior. As said by Greg Rogers in his post (Garbage Collection in C++ -- why?), memory leaks are not so terrible (at least in C++, where they are rare if C++ is really used) as to justify GC instead of RAII. GC has non deterministic deallocation/finalization and is just a way to write a code that just don't care with specific memory choices.
This last sentence is important: It is important to write code that "juste don't care". In the same way in C++ RAII we don't care about ressource freeing because RAII do it for us, or for object initialization because constructor do it for us, it is sometimes important to just code without caring about who is owner of what memory, and what kind pointer (shared, weak, etc.) we need for this or this piece of code. There seems to be a need for GC in C++. (even if I personaly fail to see it)
An example of good GC use in C++
Sometimes, in an app, you have "floating data". Imagine a tree-like structure of data, but no one is really "owner" of the data (and no one really cares about when exactly it will be destroyed). Multiple objects can use it, and then, discard it. You want it to be freed when no one is using it anymore.
The C++ approach is using a smart pointer. The boost::shared_ptr comes to mind. So each piece of data is owned by its own shared pointer. Cool. The problem is that when each piece of data can refer to another piece of data. You cannot use shared pointers because they are using a reference counter, which won't support circular references (A points to B, and B points to A). So you must know think a lot about where to use weak pointers (boost::weak_ptr), and when to use shared pointers.
With a GC, you just use the tree structured data.
The downside being that you must not care when the "floating data" will really be destroyed. Only that it will be destroyed.
Conclusion
So in the end, if done properly, and compatible with the current idioms of C++, GC would be a Yet Another Good Tool for C++.
C++ is a multiparadigm language: Adding a GC will perhaps make some C++ fanboys cry because of treason, but in the end, it could be a good idea, and I guess the C++ Standards Comitee won't let this kind of major feature break the language, so we can trust them to make the necessary work to enable a correct C++ GC that won't interfere with C++: As always in C++, if you don't need a feature, don't use it and it will cost you nothing.
The short answer is that garbage collection is very similar in principle to RAII with smart pointers. If every piece of memory you ever allocate lies within an object, and that object is only referred to by smart pointers, you have something close to garbage collection (potentially better). The advantage comes from not having to be so judicious about scoping and smart-pointering every object, and letting the runtime do the work for you.
This question seems analogous to "what does C++ have to offer the experienced assembly developer? instructions and subroutines eliminate the need for it, right?"
With the advent of good memory checkers like valgrind, I don't see much use to garbage collection as a safety net "in case" we forgot to deallocate something - especially since it doesn't help much in managing the more generic case of resources other than memory (although these are much less common). Besides, explicitly allocating and deallocating memory (even with smart pointers) is fairly rare in the code I've seen, since containers are a much simpler and better way usually.
But garbage collection can offer performance benefits potentially, especially if alot of short lived objects are being heap allocated. GC also potentially offers better locality of reference for newly created objects (comparable to objects on the stack).
The motivating factor for GC support in C++ appears to be lambda programming, anonymous functions etc. It turns out that lambda libraries benefit from the ability to allocate memory without caring about cleanup. The benefit for ordinary developers would be simpler, more reliable and faster compiling lambda libraries.
GC also helps simulate infinite memory; the only reason you need to delete PODs is that you need to recycle memory. If you have either GC or infinite memory, there is no need to delete PODs anymore.
I don't understand how one can argue that RAII replaces GC, or is vastly superior. There are many cases handled by a gc that RAII simply cannot deal with at all. They are different beasts.
First, RAII is not bullet proof: it works against some common failures which are pervasive in C++, but there are many cases where RAII does not help at all; it is fragile to asynchronous events (like signals under UNIX). Fundamentally, RAII relies on scoping: when a variable is out of scope, it is automatically freed (assuming the destructor is correctly implemented of course).
Here is a simple example where neither auto_ptr or RAII can help you:
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <memory>
using namespace std;
volatile sig_atomic_t got_sigint = 0;
class A {
public:
A() { printf("ctor\n"); };
~A() { printf("dtor\n"); };
};
void catch_sigint (int sig)
{
got_sigint = 1;
}
/* Emulate expensive computation */
void do_something()
{
sleep(3);
}
void handle_sigint()
{
printf("Caught SIGINT\n");
exit(EXIT_FAILURE);
}
int main (void)
{
A a;
auto_ptr<A> aa(new A);
signal(SIGINT, catch_sigint);
while (1) {
if (got_sigint == 0) {
do_something();
} else {
handle_sigint();
return -1;
}
}
}
The destructor of A will never be called. Of course, it is an artificial and somewhat contrived example, but a similar situation can actually happen; for example when your code is called by another code which handles SIGINT and which you have no control over at all (concrete example: mex extensions in matlab). It is the same reason why finally in python does not guarantee execution of something. Gc can help you in this case.
Other idioms do not play well with this: in any non trivial program, you will need stateful objects (I am using the word object in a very broad sense here, it can be any construction allowed by the language); if you need to control the state outside one function, you can't easily do that with RAII (which is why RAII is not that helpful for asynchronous programming). OTOH, gc have a view of the whole memory of your process, that is it knows about all the objects it allocated, and can clean asynchronously.
It can also be much faster to use gc, for the same reasons: if you need to allocate/deallocate many objects (in particular small objects), gc will vastly outperform RAII, unless you write a custom allocator, since the gc can allocate/clean many objects in one pass. Some well known C++ projects use gc, even where performance matter (see for example Tim Sweenie about the use of gc in Unreal Tournament: http://lambda-the-ultimate.org/node/1277). GC basically increases throughput at the cost of latency.
Of course, there are cases where RAII is better than gc; in particular, the gc concept is mostly concerned with memory, and that's not the only ressource. Things like file, etc... can be well handled with RAII. Languages without memory handling like python or ruby do have something like RAII for those cases, BTW (with statement in python). RAII is very useful when you precisely need to control when the ressource is freed, and that's quite often the case for files or locks for example.
The committee isn't adding garbage-collection, they are adding a couple of features that allow garbage collection to be more safely implemented. Only time will tell whether they actually have any effect whatsoever on future compilers. The specific implementations could vary widely, but will most likely involve reachability-based collection, which could involve a slight hang, depending on how it's done.
One thing is, though, no standards-conformant garbage collector will be able to call destructors - only to silently reuse lost memory.
What advantages could garbage collection offer an experienced C++ developer?
Not having to chase down resource leaks in your less-experienced colleagues' code.
It's an all-to-common error to assume that because C++ does not have garbage collection baked into the language, you can't use garbage collection in C++ period. This is nonsense. I know of elite C++ programmers who use the Boehm collector as a matter of course in their work.
Garbage collection allows to postpone the decision about who owns an object.
C++ uses value semantics, so with RAII, indeed, objects are recollected when going out of scope. This is sometimes referred to as "immediate GC".
When your program starts using reference-semantics (through smart pointers etc...), the language does no longer support you, you're left to the wit of your smart pointer library.
The tricky thing about GC is deciding upon when an object is no longer needed.
Garbage collection makes RCU lockless synchronization much easier to implement correctly and efficiently.
Easier thread safety and scalability
There is one property of GC which may be very important in some scenarios. Assignment of pointer is naturally atomic on most platforms, while creating thread-safe reference counted ("smart") pointers is quite hard and introduces significant synchronization overhead. As a result, smart pointers are often told "not to scale well" on multi-core architecture.
Garbage collection is really the basis for automatic resource management. And having GC changes the way you tackle problems in a way that is hard to quantify. For example when you are doing manual resource management you need to:
Consider when an item can be freed (are all modules/classes finished with it?)
Consider who's responsibility it is to free a resource when it is ready to be freed (which class/module should free this item?)
In the trivial case there is no complexity. E.g. you open a file at the start of a method and close it at the end. Or the caller must free this returned block of memory.
Things start to get complicated quickly when you have multiple modules that interact with a resource and it is not as clear who needs to clean up. The end result is that the whole approach to tackling a problem includes certain programming and design patterns which are a compromise.
In languages that have garbage collection you can use a disposable pattern where you can free resources you know you've finished with but if you fail to free them the GC is there to save the day.
Smart pointers which is actually a perfect example of the compromises I mentioned. Smart pointers can't save you from leaking cyclic data structures unless you have a backup mechanism. To avoid this problem you often compromise and avoid using a cyclic structure even though it may otherwise be the best fit.
I, too, have doubts that C++ commitee is adding a full-fledged garbage collection to the standard.
But I would say that the main reason for adding/having garbage collection in modern language is that there are too few good reasons against garbage collection. Since eighties there were several huge advances in the field of memory management and garbage collection and I believe there are even garbage collection strategies that could give you soft-real-time-like guarantees (like, "GC won't take more than .... in the worst case").
using RAII with smart pointers eliminates the need for it, right?
Smart pointers can be used to implement reference counting in C++ which is a form of garbage collection (automatic memory management) but production GCs no longer use reference counting because it has some important deficiencies:
Reference counting leaks cycles. Consider A↔B, both objects A and B refer to each other so they both have a reference count of 1 and neither is collected but they should both be reclaimed. Advanced algorithms like trial deletion solve this problem but add a lot of complexity. Using weak_ptr as a workaround is falling back to manual memory management.
Naive reference counting is slow for several reasons. Firstly, it requires out-of-cache reference counts to be bumped often (see Boost's shared_ptr up to 10× slower than OCaml's garbage collection). Secondly, destructors injected at the end of scope can incur unnecessary-and-expensive virtual function calls and inhibit optimizations such as tail call elimination.
Scope-based reference counting keeps floating garbage around as objects are not recycled until the end of scope whereas tracing GCs can reclaim them as soon as they become unreachable, e.g. can a local allocated before a loop be reclaimed during the loop?
What advantages could garbage collection offer an experienced C++ developer?
Productivity and reliability are the main benefits. For many applications, manual memory management requires significant programmer effort. By simulating an infinite-memory machine, garbage collection liberates the programmer from this burden which allows them to focus on problem solving and evades some important classes of bugs (dangling pointers, missing free, double free). Furthermore, garbage collection facilitates other forms of programming, e.g. by solving the upwards funarg problem (1970).
In a framework that supports GC, a reference to an immutable object such as a string may be passed around in the same way as a primitive. Consider the class (C# or Java):
public class MaximumItemFinder
{
String maxItemName = "";
int maxItemValue = -2147483647 - 1;
public void AddAnother(int itemValue, String itemName)
{
if (itemValue >= maxItemValue)
{
maxItemValue = itemValue;
maxItemName = itemName;
}
}
public String getMaxItemName() { return maxItemName; }
public int getMaxItemValue() { return maxItemValue; }
}
Note that this code never has to do anything with the contents of any of the strings, and can simply treat them as primitives. A statement like maxItemName = itemName; will likely generate two instructions: a register load followed by a register store. The MaximumItemFinder will have no way of knowing whether callers of AddAnother are going to retain any reference to the passed-in strings, and callers will have no way of knowing how long MaximumItemFinder will retain references to them. Callers of getMaxItemName will have no way of knowing if and when MaximumItemFinder and the original supplier of the returned string have abandoned all references to it. Because code can simply pass string references around like primitive values, however, none of those things matter.
Note also that while the class above would not be thread-safe in the presence of simultaneous calls to AddAnother, any call to GetMaxItemName would be guaranteed to return a valid reference to either an empty string or one of the strings that had been passed to AddAnother. Thread synchronization would be required if one wanted to ensure any relationship between the maximum-item name and its value, but memory safety is assured even in its absence.
I don't think there's any way to write a method like the above in C++ which would uphold memory safety in the presence of arbitrary multi-threaded usage without either using thread synchronization or else requiring that every string variable have its own copy of its contents, held in its own storage space, which may not be released or relocated during the lifetime of the variable in question. It would certainly not be possible to define a string-reference type which could be defined, assigned, and passed around as cheaply as an int.
Garbage Collection Can Make Leaks Your Worst Nightmare
Full-fledged GC that handles things like cyclic references would be somewhat of an upgrade over a ref-counted shared_ptr. I would somewhat welcome it in C++, but not at the language level.
One of the beauties about C++ is that it doesn't force garbage collection on you.
I want to correct a common misconception: a garbage collection myth that it somehow eliminates leaks. From my experience, the worst nightmares of debugging code written by others and trying to spot the most expensive logical leaks involved garbage collection with languages like embedded Python through a resource-intensive host application.
When talking about subjects like GC, there's theory and then there's practice. In theory it's wonderful and prevents leaks. Yet at the theoretical level, so is every language wonderful and leak-free since in theory, everyone would write perfectly correct code and test every single possible case where a single piece of code could go wrong.
Garbage collection combined with less-than-ideal team collaboration caused the worst, hardest-to-debug leaks in our case.
The problem still has to do with ownership of resources. You have to make clear design decisions here when persistent objects are involved, and garbage collection makes it all too easy to think that you don't.
Given some resource, R, in a team environment where the developers aren't constantly communicating and reviewing each other's code carefully at alll times (something a little too common in my experience), it becomes quite easy for developer A to store a handle to that resource. Developer B does as well, perhaps in an obscure way that indirectly adds R to some data structure. So does C. In a garbage-collected system, this has created 3 owners of R.
Because developer A was the one that created the resource originally and thinks he's the owner of it, he remembers to release the reference to R when the user indicates that he no longer wants to use it. After all, if he fails to do so, nothing would happen and it would be obvious from testing that the user-end removal logic did nothing. So he remembers to release it, as any reasonably competent developer would do. This triggers an event for which B handles it and also remembers to release the reference to R.
However, C forgets. He's not one of the stronger developers on the team: a somewhat fresh recruit who has only worked in the system for a year. Or maybe he's not even on the team, just a popular third party developer writing plugins for our product that many users add to the software. With garbage collection, this is when we get those silent logical resource leaks. They're the worst kind: they do not necessarily manifest in the user-visible side of the software as an obvious bug besides the fact that over durations of running the program, the memory usage just continues to rise and rise for some mysterious purpose. Trying to narrow down these issues with a debugger can be about as fun as debugging a time-sensitive race condition.
Without garbage collection, developer C would have created a dangling pointer. He may try to access it at some point and cause the software to crash. Now that's a testing/user-visible bug. C gets embarrassed a bit and corrects his bug. In the GC scenario, just trying to figure out where the system is leaking may be so difficult that some of the leaks are never corrected. These are not valgrind-type physical leaks that can be detected easily and pinpointed to a specific line of code.
With garbage collection, developer C has created a very mysterious leak. His code may continue to access R which is now just some invisible entity in the software, irrelevant to the user at this point, but still in a valid state. And as C's code creates more leaks, he's creating more hidden processing on irrelevant resources, and the software is not only leaking memory but also getting slower and slower each time.
So garbage collection does not necessarily mitigate logical resource leaks. It can, in less than ideal scenarios, make leaks far easier to silently go unnoticed and remain in the software. The developers might get so frustrated trying to trace down their GC logical leaks that they simply tell their users to restart the software periodically as a workaround. It does eliminate dangling pointers, and in a safety-obsessed software where crashing is completely unacceptable under any scenario, then I would prefer GC. But I'm often working in less safety-critical but resource-intensive, performance-critical products where a crash that can be fixed promptly is preferable to a really obscure and mysterious silent bug, and resource leaks are not trivial bugs there.
In both of these cases, we're talking about persistent objects not residing on the stack, like a scene graph in a 3D software or the video clips available in a compositor or the enemies in a game world. When resources tie their lifetimes to the stack, both C++ and any other GC language tend to make it trivial to manage resources properly. The real difficulty lies in persistent resources referencing other resources.
In C or C++, you can have dangling pointers and crashes resulting from segfaults if you fail to clearly designate who owns a resource and when handles to them should be released (ex: set to null in response to an event). Yet in GC, that loud and obnoxious but often easy-to-spot crash is exchanged for a silent resource leak that may never be detected.