Embedded C++ : to use STL or not? - c++

I have always been an embedded software engineer, but usually at Layer 3 or 2 of the OSI stack. I am not really a hardware guy. I have generally always done telecoms products, usually hand/cell-phones, which generally means something like an ARM 7 processor.
Now I find myself in a more generic embedded world, in a small start-up, where I might move to "not so powerful" processors (there's the subjective bit) - I cannot predict which.
I have read quite a bit about debate about using STL in C++ in embedded systems and there is no clear cut answer. There are some small worries about portability, and a few about code size or run-time, but I have two major concerns:
1 - exception handling; I am still not sure whether to use it (see Embedded C++ : to use exceptions or not?)
2 - I strongly dislike dynamic memory allocation in embedded systems, because of the problems it can introduce. I generally have a buffer pool which is statically allocated at compile time and which serves up only fixed size buffers (if no buffers, system reset). The STL, of course, does a lot of dynamic allocation.
Now I have to make the decision whether to use or forego the STL - for the whole company, for ever (it's going into some very core s/w).
Which way do I jump? Super-safe & lose much of what constitutes C++ (imo, it's more than just the language definition) and maybe run into problems later or have to add lots of exception handling & maybe some other code now?
I am tempted to just go with Boost, but 1) I am not sure if it will port to every embedded processor I might want to use and 2) on their website, they say that they doesn't guarantee/recommend certain parts of it for embedded systems (especially FSMs, which seems weird). If I go for Boost & we find a problem later ....

I work on real-time embedded systems every day. Of course, my definition of embedded system may be different than yours. But we make full use of the STL and exceptions and do not experience any unmanageable problems. We also make use of dynamic memory (at a very high rate; allocating lots of packets per second, etc.) and have not yet needed to resort to any custom allocators or memory pools. We have even used C++ in interrupt handlers. We don't use boost, but only because a certain government agency won't let us.
It is our experience you can indeed use many modern C++ features in an embedded environment as long as you use your head and conduct your own benchmarks. I highly recommend you make use of Scott Meyer's Effective C++ 3rd edition as well as Sutter and Alexandrescu's C++ Coding Standards to assist you in using C++ with a sane programming style.
Edit: After getting an upvote on this 2 years later, let me post an update. We are much farther along in our development and we have finally hit spots in our code where the standard library containers are too slow under high performance conditions. Here we did in fact resort to custom algorithms, memory pools, and simplified containers. That is the beauty of C++ though, you can use the standard library and get all the good things it provides for 90% of your use cases. You don't throw it all out when you meet problems, you just hand-optimize the trouble spots.

Super-safe & lose much of what
constitutes C++ (imo, it's more than
just the language definition) and
maybe run into problems later or have
to add lots of exception handling &
maybe some other code now?
We have a similar debate in the game world and people come down on both sides. Regarding the quoted part, why would you be concerned about losing "much of what constitutes C++"? If it's not pragmatic, don't use it. It shouldn't matter if it's "C++" or not.
Run some tests. Can you get around STL's memory management in ways that satisfy you? If so, was it worth the effort? A lot of problems STL and boost are designed to solve just plain don't come up if you design to avoid haphazard dynamic memory allocation... does STL solve a specific problem you face?
Lots of people have tackled STL in tight environments and been happy with it. Lots of people just avoid it. Some people propose entirely new standards. I don't think there's one right answer.

The other posts have addressed the important issues of dynamic memory allocation, exceptions and possible code bloat. I just want to add: Don't forget about <algorithm>! Regardless of whether you use STL vectors or plain C arrays and pointers, you can still use sort(), binary_search(), random_shuffle(), the functions for building and managing heaps, etc. These routines will almost certainly be faster and less buggy than versions you build yourself.
Example: unless you think about it carefully, a shuffle algorithm you build yourself is likely to produce skewed distributions; random_shuffle() won't.

Paul Pedriana from Electronic Arts wrote in 2007 a lengthy treatise on why the STL was inappropriate for embedded console development and why they had to write their own. It's a detailed article, but the most important reasons were:
STL allocators are slow, bloated,
and inefficient
Compilers aren't actually very good at inlining all those deep function calls
STL allocators don't support explicit alignment
The STL algorithms that come with GCC and MSVC's STL aren't very performant, because they're very platform-agnostic and thus miss a lot of microoptimizations that can make a big difference.
Some years ago, our company made the decision not to use the STL at all, instead implementing our own system of containers that are maximally performant, easier to debug, and more conservative of memory. It was a lot of work but it has repaid itself many times over. But ours is a space in which products compete on how much they can cram into 16.6ms with a given CPU and memory size.
As to exceptions: they are slow on consoles, and anyone who tells you otherwise hasn't tried timing them. Simply compiling with them enabled will slow down the entire program because of the necessary prolog/epilog code -- measure it yourself if you don't believe me. It's even worse on in-order CPUs than it is on the x86. For this reason, the compiler we use doesn't even support C++ exceptions.
The performance gain isn't so much from avoiding the cost of an exception throw — it's from disabling exceptions entirely.

Let me start out by saying I haven't done embedded work for a few years, and never in C++, so my advice is worth every penny you're paying for it...
The templates utilized by STL are never going to generate code you wouldn't need to generate yourself, so I wouldn't worry about code bloat.
The STL doesn't throw exceptions on its own, so that shouldn't be a concern. If your classes don't throw, you should be safe. Divide your object initialization into two parts, let the constructor create a bare bones object and then do any initialization that could fail in a member function that returns an error code.
I think all of the container classes will let you define your own allocation function, so if you want to allocate from a pool you can make it happen.

The open source project "Embedded Template Library (ETL)" targets the usual problems with the STL used in Embedded Applications by providing/implementing a library:
deterministic behaviour
"Create a set of containers where the size or maximum size is determined at compile time. These containers should be largely equivalent to those supplied in the STL, with a compatible API."
no dynamic memory allocation
no RTTI required
little use of virtual functions (only when absolutely necessary)
set of fixed capacity containers
cache friendly storage of containers as continously allocated memory block
reduced container code size
typesafe smart enumerations
CRC calculations
checksums & hash functions
variants = sort of type safe unions
Choice of asserts, exceptions, error handler or no checks on errors
heavily unit tested
well documented source code
and other features...
You can also consider a commercial C++ STL for Embedded Developers provided by E.S.R. Labs.

for memory management, you can implement your own allocator, which request memory from the pool. And all STL container have a template for the allocator.
for exception, STL doesn't throw many exceptions, in generally, the most common are: out of memory, in your case, the system should reset, so you can do reset in the allocator. others are such as out of range, you can avoid it by the user.
so, i think you can use STL in embedded system :)

In addition to all comments, I would propose you reading of Technical Report on C++ Performance which specifically addresses topics that you are interested in: using C++ in embedded (including hard real-time systems); how exception-handling usually implemented and which overhead it has; free store allocation's overhead.
The report is really good as is debunks many popular tails about C++ performance.

It basically depends on your compiler and in the amount of memory you have. If you have more than a few Kb of ram, having dynamic memory allocation helps a lot. If the implementation of malloc from the standard library that you have is not tuned to your memory size you can write your own, or there are nice examples around such as mm_malloc from Ralph Hempel that you can use to write your new and delete operators on top.
I don't agree with those that repeat the meme that exceptions and stl containers are too slow, or too bloated etc. Of course it adds a little more code than a simple C's malloc, but judicious use of exceptions can make code much clear and avoid too much error checking blurb in C.
One has to keep in mind that STL allocators will increase their allocations in powers of two, which means sometimes it will do some reallocations until it reaches the correct size, which you can prevent with reserve so it becomes as cheap as one malloc of the desired size if you know the size to allocate anyway.
If you have a big buffer in a vector for example, at some point it might do a reallocation and ends using up 1.5x the memory size that you are intending it to use at some point while reallocating and moving data. (For example, at some point it has N bytes allocated, you add data via append or an insertion iterator and it allocates 2N bytes, copies the first N and releases N. You have 3N bytes allocated at some point).
So in the end it has a lot of advantages, and pays of if you know what you are doing. You should know a little of how C++ works to use it on embedded projects without surprises.
And to the guy of the fixed buffers and reset, you can always reset inside the new operator or whatever if you are out of memory, but that would mean you did a bad design that can exhaust your memory.
An exception being thrown with ARM realview 3.1:
--- OSD\#1504 throw fapi_error("OSDHANDLER_BitBlitFill",res);
S:218E72F0 E1A00000 MOV r0,r0
S:218E72F4 E58D0004 STR r0,[sp,#4]
S:218E72F8 E1A02000 MOV r2,r0
S:218E72FC E24F109C ADR r1,{pc}-0x94 ; 0x218e7268
S:218E7300 E28D0010 ADD r0,sp,#0x10
S:218E7304 FA0621E3 BLX _ZNSsC1EPKcRKSaIcE <0x21a6fa98>
S:218E7308 E1A0B000 MOV r11,r0
S:218E730C E1A0200A MOV r2,r10
S:218E7310 E1A01000 MOV r1,r0
S:218E7314 E28D0014 ADD r0,sp,#0x14
S:218E7318 EB05C35F BL fapi_error::fapi_error <0x21a5809c>
S:218E731C E3A00008 MOV r0,#8
S:218E7320 FA056C58 BLX __cxa_allocate_exception <0x21a42488>
S:218E7324 E58D0008 STR r0,[sp,#8]
S:218E7328 E28D1014 ADD r1,sp,#0x14
S:218E732C EB05C340 BL _ZN10fapi_errorC1ERKS_ <0x21a58034>
S:218E7330 E58D0008 STR r0,[sp,#8]
S:218E7334 E28D0014 ADD r0,sp,#0x14
S:218E7338 EB05C36E BL _ZN10fapi_errorD1Ev <0x21a580f8>
S:218E733C E51F2F98 LDR r2,0x218e63ac <OSD\#1126>
S:218E7340 E51F1F98 LDR r1,0x218e63b0 <OSD\#1126>
S:218E7344 E59D0008 LDR r0,[sp,#8]
S:218E7348 FB056D05 BLX __cxa_throw <0x21a42766>
Doesn't seem so scary, and no overhead is added inside {} blocks or functions if the exception isn't thrown.

The biggest problem with STL in embedded systems is the memory allocation issue (which, as you said, causes a lot of problems).
I'd seriously research creating your own memory management, built by overriding the new/delete operators. I'm pretty sure that with a bit of time, it can be done, and it's almost certainly worth it.
As for the exceptions issue, I wouldn't go there. Exceptions are a serious slowdown of your code, because they cause every single block ({ }) to have code before and after, allowing the catching of the exception and the destruction of any objects contained within. I don't have hard data on this on hand, but every time I've seen this issue come up, I've seen overwhelming evidence of a massive slowdown caused by using exceptions.
Edit:
Since a lot of people wrote comments stating that exception handling is not slower, I thought I'd add this little note (thanks for the people who wrote this in comments, I thought it'd be good to add it here).
The reason exception handling slows down your code is because the compiler must make sure that every block ({}), from the place an exception is thrown to the place it is dealt with, must deallocate any objects within it. This is code that is added to every block, regardless of whether anyone ever throws an exception or not (since the compiler can't tell at compile time whether this block will be part of an exception "chain").
Of course, this might be an old way of doing things that has gotten much faster in newer compilers (I'm not exactly up-to-date on C++ compiler optimizations). The best way to know is just to run some sample code, with exceptions turned on and off (and which includes a few nested functions), and time the difference.

On our embedded scanner project we were developing a board with ARM7 CPU and STL didn't bring any issue. Surely the project details are important since dynamic memory allocation may not be an issue for many available boards today and type of projects.

Related

Does C++ code run faster if there is no structure in program

I know it helps a lot if we structure our programs using classes, structs etc. but does it help in terms of running speed that we avoid these structures and write code plain in terms of basic C++ syntax?
For example, I am trying to write a program that works on vectors. Now it sounds tempting to write a class vector and define its methods like set_at_index(int i) that sets the value of specific row i of this vector. Furthermore I can check whether i<=N where N is the length of the vector in question.
My confusion is that with these routine every set_at_index method that is used a lot will require one 'if' statement. So if I want my code to run faster should I avoid it and go with declaring an array and manually take care that there is no memory leak?
Is there any way I can check for the memory leaks without putting burden on the code speed?
Yes, bounds checking will take slightly more time. But it will take so little extra time that it will only matter if the code is being run 28894389375 times and then it might add up to a millisecond. Note that std::vector only performs bounds checking if you use the at member function, not if you use operator[]. Also, if you are doing anything like writing to a file or printing text to the console, doing that one time will likely take more time than ten million bounds-checked array accesses, because I/O is relatively very very slow.
Typically, without bounds checking code using classes will run at the same speed as code using plain arrays. The problem with manually managing memory like you suggest is that it's easy to forget to clean it up, or to clean it up only through one path of execution through the program, or to fail to clean it up in the event of an exception. It's really hardly ever worth it. Also, it'll be just as fast to use a vector class without bounds checking as it will be to use a dynamic array without bounds checking. You pay for it either way.
I also suggest using std::vector instead of writing your own vector class since they do pretty much every optimisation you could do yourself, and they usually have the advantage of being able to write the code for their specific compiler and perhaps be able to take advantage of things that only that compiler does because they know more of its implementation. The STL classes are also rigorously tested and written by experts (usually).
You should write your code first, then measure with a profiler to see the bottlenecks in your code if it is not fast enough already, then optimise the bottlenecks. I will bet that bounds checking on arrays is probably not going to be one of those bottlenecks.
Checking for memory leaks can be done with a tool like valgrind. You don't do it in the code itself.
Don't try to over optimize before you even start writing. Go ahead and write code that is easily maintainable, readable, and as bug free as possible. Once you have things working, you can start profiling to see the real bottlenecks.
"Premature optimization is root of all evils" - Donald Knuth. (this is true 97% of the time).
Unless you profile your application and see that your class encapsulation is a bottleneck that does slow your application in a significant amount, don't hesitate to have high level structures. It will brings you plenty of benefits like readability, maintainance, and understanding what you do. That's what brings OOP: Big scale programs.
Some good answers have already been posted, and premature optimization is indeed inadvisable as others have said. However, let me put your question in slightly another light.
I know it helps a lot if we structure our programs using classes,
structs etc. but does it help in terms of running speed that we avoid
these structures and write code plain in terms of basic C++ syntax?
Theoretically, most properly written C++ code should run just as fast with fully developed classes as without, but
there are exceptions to the rule;
the effort required to write the C++ code theoretically properly may be too great; and
the same features of the C++ compiler that make it hard to write incorrect code can make it all too easy to write grossly inefficient code.
Point-by-point remarks follow.
Consider a complex three-dimensional vector type, of which each instance consists of six doubles (three real parts and three imaginary parts). If there were not so many doubles, your compiler might load them directly into your microprocessor's registers, but with six they are likely to remain on the stack when the complex three-dimensional vector is loaded. Some operations however on a complex three-dimensional vector do not require all six doubles, but only one, two or three of them. If so, then it might be preferable to store the six floating-point components separately. Thus, rather than an array of 1000 vectors, you'd keep six arrays of 1000 doubles each. Of course, one can (and probably should) bind the arrays together in a class of some kind, but -- for efficiency reasons only -- a good design might never explicitly associate individual elements from one array to another.
Sometimes, you know where your data is and what you want to do with it, and C++'s elaborate organizational and access-control facilities only get in your way. In this case, you might skip the high-level C++ and just do what you want in primitive, hackworthy, brutish, machete-wielding C-style code. Indeed, C++ explicitly supports this style of coding by making it possible -- nay, easy -- to wrap the primitive C code safely within a module and thus to hide its horror from the rest of your beautiful C++ program. Of course, if you hand your code a machete, so to speak, then you take a risk, don't you, because your code may hack up data you never wanted it to, and your compiler will stand aside and let it do it; but sometimes the risk is worth the gain, and sometimes the risk is even fun (and character-building) for a programmer's change of pace.
This point is the most subtle of the three. Where a user-defined type consists partly of other user-defined types, multiple layers of constructors will be called and implicitly invoked. This is great, and usually it is what you want, especially if you have a good unit-testing regime at each layer. The rose however has a thorn, as it were. A properly written constructor is careful never to lose anything it needs. So, unless the programmer is most careful, a constructor may quietly make a lot of strictly unnecessary copies of very large objects. Sometimes, the programmer will mentally lose track of all the levels of implicit invocation, which he never would have done if he had had to handle each invocation explicitly. Also, your data in an object of one type may lack access to a member function to which it can easily gain access, so long is it temporarily copies itself to an object of another type (you can avoid the copy with the use of handle types, reference counting and so forth, but this is not free: it's quite a bit of work). Even if the programmer is conscious of the implicit copy, the implicit copy is so much easier to code in the moment that the temptation to do so is sometimes too great -- especially when a deadline looms! Several hidden inefficiencies can arise in these ways. One can, and should, work around such inefficiencies, of course, but it can take a lot of coding effort to do so and, even then, your compiler is so busy helping you to avoid logical errors that it tends to cause you to create inadvertent inefficiencies that you would never purposely have created. The unnecessary, hidden copying of data is a much bigger problem in C++ than it ever was in C.
All in all, I would say that the C++ trade-off is worth it 80 percent of the time. C++'s organizational and access-control facilities merit the effort it takes to apply them properly. If your question regards the 20 percent, well, there is more than one valid approach to programming, in my view. Sometimes it really does help "that we avoid these structures and write code plain in terms of basic C++ syntax," as you have said.
Usually, no. Sometimes, yes. I think that the earlier answers are right, though, that the particular example you have posed is probably better treated in boring, neat, orderly C++, without tricks.
Two things:
DO NOT do any kind of premature optimization, CPUs are fast nowadays, compilers are smart and able to figure out optimizations that you wouldn't think of in months of looking at your code.
you can easily check things like memory leaks by profiling your code and/or using conditional compilation. Leaks shouldn't occur on release versions so you should just skip that checks.

Most common reasons for unstable bugs in C++?

I am currently working on a large project, and I spend most of the time debugging. While debugging is a normal process, there are bugs, that are unstable, and these bugs are the greatest pain for the developer. The program does not work, well, sometimes... Sometimes it does, and there is nothing you can do about it.
What can be done about these bugs? Most common debugging tools (interactive debuggers, watches, log messages) may lead you nowhere, because the bug will disappear ... just to appear once again, later. That is why I am asking for some heuristics: what are the most common reasons for such bugs? What suspicious code should we investigate to locate such a bugs?
Let me start the list:
using uninitialized variables.
Common misprints like mMember =
mMember;
thread synchronization.
Sometimes it can be a matter of
luck;
working with non-smart
pointers, dereferencing invalid
ones;
what else?
IME the underlying problem in many projects is that developers use low-level features of C++ like manual memory management, C-style string handling, etc. even though they are very rarely ever necessary (and then only well encapsulated in classes). This leads to memory corruption, invalid pointers, buffer overflows, resource leaks and whatnot. All the while nice and clean high-level constructs are available.
I was part of the team for a large (several MLoC) application for several years and the number of crashing bugs for different parts of the application nicely correlated to the programming style used within these parts. When asked why they wouldn't change their programming style some of the culprits answered that their style in general yields more performance. (Not only is this wrong, it's also a fact that customers rather have a more stable but slower program than a fast one that keeps crashing on them. Also, most of their code wasn't even required to be fast...)
As for multi-threading: I don't feel expert enough to offer solutions here, but I think Herb Sutter's Effective Concurrency columns are a very worthwhile read on the subject.
Edit to address the discussions in the comments:
I did not write that "C-style string handling is not more performant". (Certainly a lot of negation in this sentence, but since I feel misread, I try to be precise.) What I said is that high level constructs are not in general less performant: std::vector isn't in general slower than manually doing dynamically allocated C arrays, since it is a dynamically allocated C array. Of course, there are cases where something coded according to special requirements will perform better than any general solution -- but that doesn't necessarily mean you'll have to resort to manual memory management. This is why I wrote that, if such things are necessary, then only well-encapsulated in classes.
But what's even more important: in most code the difference doesn't matter. Whether a button depresses 0.01secs after someone clicked it or 0.05secs simply doesn't matter, so even a factor 5 speed gain is irrelevant in the button's code. Whether the code crashes, however, always matters.
To sum up my argument: First make it work correctly. This is best done using well-proven off-the-shelf building blocks. Then measure. Then improve performance where it matters, using well-proven off-the-shelf idioms.
I was actually going to post a question that asked exactly the opposite - do others find, as I do, that you spend almost no time using the debugger when working with C++? I honestly cannot remember the last time I used one - it must have been about six months ago.
Frankly, if you spend most of the time in the debugger, I think there is something very wrong with your basic coding practices.
Race conditions.
These are one of the few things that still sends a shiver down my spine when it comes up in debugging (or in the issue tracker). Inherently horrible to debug, and extremely easy to create. The three most common causes of bugs in my C++ software have been race conditions, reliance on uninitialised memory, and reliance on static constructor order.
And if you don't know what race conditions are, chances are they're the cause of your instability ;)
If you are really in a position where you already have bad code that breaks, the best plan is probably to throw as many tools at it as you can (OS/lib-level memory checking, automated testing, logging, core dumps, etc) to find the problem areas. Then rewrite the code to do something more deterministic. Most of the bugs come from people doing things that mostly work most of the time, but C++ offers stronger guarantees if you use the right tools and approaches.
Haven't seen this one mentioned yet:
Inheriting from a class that does not have a virtual destructor.
Reading from uncached memory while a cache line is being written back over the memory (This is a right bastard to find).
Buffer overwrites
Stack overflows!
The only 3 i can think of at the mo ... may edit later :)
buffer overflows
using pointers to deleted objects
returning invalid references or references to out of scope objects
unhandled exceptions
resource leaks (not only memory)
infinite recursion
dynamic libraries version mismatch
Not really a C++ issue but seen in a C/C++ project.
The trickiest issue I had to deal with was an initialization issue when starting up the OS on our platform that lead to unusual crashes. It took years before we found out what happened. Before that we ran the system overnight and if it didn't crash, then it was normally okay.
Luckily, the OS isn't sold anymore.
addresses and memory used before allocation or after deallocation, segmentation faults, arrayoutofbounds, offset, threadlocks, unintelligible operator overloading, inline assembly, void exit and void in general where return values are desired complicates where math.h functions are worth a look since all math.h functions both have working arguments and return values compared to other library overly void, emptiness tests, nils, nulls and voids. 4 general conventions I recommend are return values, arguments, ternary choices and invertible changes. Faultprone to avoid are vectors (use arrays instead) void with empty arguments and in my subjective opinion I avoid the switch statement in favor of more intelligible or readable if...elseif or more abstract "is".
C++ also has rather lousy forward compatibility compared to scripts and interpreted, to try a decade old Java it still runs unchanged and safe in later vm.

Will my iPhone app take a performance hit if I use Objective-C for low level code?

When programming a CPU intensive or GPU intensive application on the iPhone or other portable hardware, you have to make wise algorithmic decisions to make your code fast.
But even great algorithm choices can be slow if the language you're using performs more poorly than another.
Is there any hard data comparing Objective-C to C++, specifically on the iPhone but maybe just on the Mac desktop, for performance of various similar language aspects? I am very familiar with this article comparing C and Objective-C, but this is a larger question of comparing two object oriented languages to each other.
For example, is a C++ vtable lookup really faster than an Obj-C message? How much faster? Threading, polymorphism, sorting, etc. Before I go on a quest to build a project with duplicate object models and various test code, I want to know if anybody has already done this and what the results where. This type of testing and comparison is a project in and of itself and can take a considerable amount of time. Maybe this isn't one project, but two and only the outputs can be compared.
I'm looking for hard data, not evangelism. Like many of you I love and hate both languages for various reasons. Furthermore, if there is someone out there actively pursuing this same thing I'd be interesting in pitching in some code to see the end results, and I'm sure others would help out too. My guess is that they both have strengths and weaknesses, my goal is to find out precisely what they are so that they can be avoided/exploited in real-world scenarios.
Mike Ash has some hard numbers for performance of various Objective-C method calls versus C and C++ in his post "Performance Comparisons of Common Operations". Also, this post
by Savoy Software is an interesting read when it comes to tuning the performance of an iPhone application by using Objective-C++.
I tend to prefer the clean, descriptive syntax of Objective-C over Objective-C++, and have not found the language itself to be the source of my performance bottlenecks. I even tend to do things that I know sacrifice a little bit of performance if they make my code much more maintainable.
Yes, well written C++ is considerably faster. If you're writing performance critical programs and your C++ is not as fast as C (or within a few percent), something's wrong. If your ObjC implementation is as fast as C, then something's usually wrong -- i.e. the program is likely a bad example of ObjC OOD because it probably uses some 'dirty' tricks to step below the abstraction layer it is operating within, such as direct ivar accesses.
The Mike Ash 'comparison' is very misleading -- I would never recommend the approach to compare execution times of programs you have written, or recommend it to compare C vs C++ vs ObjC. The results presented are provided from a test with compiler optimizations disabled. A program compiled with optimizations disabled is rarely relevant when you are measuring execution times. To view it as a benchmark which compares C++ against Objective-C is flawed. The test also compares individual features, rather than entire, real world optimized implementations -- individual features are combined in very different ways with both languages. This is far from a realistic performance benchmark for optimized implementations. Examples: With optimizations enabled, IMP cache is as slow as virtual function calls. Static dispatch (as opposed to dynamic dispatch, e.g. using virtual) and calls to known C++ types (where dynamic dispatch may be bypassed) may be optimized aggressively. This process is called devirtualization, and when it is used, a member function which is declared virtual may even be inlined. In the case of the Mike Ash test where many calls are made to member functions which have been declared virtual and have empty bodies: these calls are optimized away entirely when the type is known because the compiler sees the implementation and is able to determine dynamic dispatch is unnecessary. The compiler can also eliminate calls to malloc in optimized builds (favoring stack storage). So, enabling compiler optimizations in any of C, C++, or Objective-C can produce dramatic differences in execution times.
That's not to say the presented results are entirely useless. You could get some useful information about external APIs if you want to determine if there are measurable differences between the times they spend in pthread_create or +[NSObject alloc] on one platform or architecture versus another. Of course, these two examples will be using optimized implementations in your test (unless you happen to be developing them). But for comparing one language to another in programs you compile… the presented results are useless with optimizations disabled.
Object Creation
Consider also object creation in ObjC - every object is allocated dynamically (e.g. on the heap). With C++, objects may be created on the stack (e.g. approximately as fast as creating a C struct and calling a simple function in many cases), on the heap, or as elements of abstract data types. Each time you allocate and free (e.g. via malloc/free), you may introduce a lock. When you create a C struct or C++ object on the stack, no lock is required (although interior members may use heap allocations) and it often costs just a few instructions or a few instructions plus a function call.
As well, ObjC objects are reference counted instances. The actual need for an object to be a std::shared_ptr in performance critical C++ is very rare. It's not necessary or desirable in C++ to make every instance a shared, reference counted instance. You have much more control over ownership and lifetime with C++.
Arrays and Collections
Arrays and many collections in C and C++ also use strongly typed containers and contiguous memory. Since the address of the next element's members are often known, the optimizer can do much more, and you have great cache and memory locality. With ObjC, that's far from reality for standard objects (e.g. NSObject).
Dispatch
Regarding methods, many C++ implementations use few virtual/dynamic calls, particularly in highly optimized programs. These are static method calls and fodder for the optimizers.
With ObjC methods, each method call (objc message send) is dynamic, and is consequently a firewall for the optimizer. Ultimately, that results in many restrictions or inconveniences regarding what you can and cannot do to keep performance at a minimum when writing performance critical ObjC. This may result in larger methods, IMP caching, frequent use of C.
Some realtime applications cannot use any ObjC messaging in their render paths. None -- audio rendering is a good example of this. ObjC dispatch is simply not designed for realtime purposes; Allocations and locks may happen behind the scenes when messaging objects, making the complexity/time of objc messaging unpredictable enough that the audio rendering may miss its deadline.
Other Features
C++ also provides generics/template implementations for many of its libraries. These optimize very well. They are typesafe, and a lot of inlining and optimizations may be made with templates (consider it polymorphism, optimization, and specialization which takes place at compilation). C++ adds several features which just are not available or comparable in strict ObjC. Trying to directly compare langs, objects, and libraries which are very different is not so useful -- it's a very small subset of actual realizations. It's better to expand the question to a library/framework or real program, considering many aspects of design and implementation.
Other Points
C and C++ symbols can be more easily removed and optimized away in various stages of the build (stripping, dead code elimination, inlining and early inlining, as well as Link Time Optimization). The benefits of this include reduced binary sizes, reduced launch/load times, reduced memory consumption, etc.. For a single app, that may not be such a big deal; but if you reuse a lot of code, and you should, then your shared libraries could add a lot of unnecessary weight to the program, if implemented ObjC -- unless you are prepared to jump through some flaming hoops. So scalability and reuse are also factors in medium/large projects, and groups where reuse is high.
Included Libraries
ObjC library implementors also optimize for the environment, so its library implementors can make use of some language and environment features to offer optimized implementations. Although there are some pretty significant restrictions when writing an optimized program in pure ObjC, some highly optimized implementations exist in Cocoa. This is one of Cocoa's strong points, although the C++ standard library (what some people call the STL) is no slouch either. Cocoa operates at a much higher level of abstraction than C++ -- if you don't know well what you're doing (or should be doing), operating closer to the metal can really cost you. Falling back on to a good library implementation if you are not an expert in some domain is a good thing, unless you are really prepared to learn. As well, Cocoa's environments are limited; you can find implementations/optimizations which make better use of the OS.
If you're writing optimized programs and have experience doing so in both C++ and ObjC, clean C++ implementations will often be twice as fast or faster than clean ObjC (yes, you can compare against Cocoa). If you know how to optimize, you can often do better than higher level, general purpose abstractions. Although, some optimized C++ implementations will be as fast as or slower than Cocoa's (e.g. my initial attempt at file I/O was slower than Cocoa's -- primarily because the C++ implementation initializes its memory).
A lot of it comes down to the language features you are familiar with. I use both langs, they both have different strengths and models/patterns. They complement each other quite well, and there are great libraries for both. If you're implementing a complex, performance critical program, correct use of C++'s features and libraries will give you much more control and provide significant advantages for optimization, such that in the right hands, "several times faster" is a good default expectation (don't expect to win every time, or without some work, however). Remember, it takes years to understand C++ well enough to really reach that point.
I keep the majority of my performance critical paths as C++, but also recognize that ObjC is also a very good solution for some problems, and that there are some very good libraries available.
It's very hard to collect "hard data" for this that's not misguiding.
The biggest problem with doing a feature-to-feature comparison like you suggest is that the two languages encourage very different coding styles. Objective-C is a dynamic language with duck typing, where typical C++ usage is static. The same object-oriented architecture problem would likely have very different ideal solutions using C++ or Objective-C.
My feeling (as I have programmed much in both languages, mostly on huge projects): To maximize Objective-C performance, it has to be written very close to C. Whereas with C++, it's possible to make much more use of the language without any performance penalty compared to C.
Which one is better? I don't know. For pure performance, C++ will always have the edge. But the OOP style of Objective-C definitely has its merits. I definitely think it is easier to keep a sane architecture with it.
This really isn't something that can be answered in general as it really depends on how you use the language features. Both languages will have things that they are fast at, things that they are slow at, and things that are sometimes fast and sometimes slow. It really depends on what you use and how you use it. The only way to be certain is to profile your code.
In Objective C you can also write c++ code, so it might be easier to code in Objective C for the most part, and if you find something that doesn't perform well in it, then you can have a go at writting a c++ version of it and seeing if that helps (C++ tends to optimize better at compile time). Objective C will be easier to use if APIs you are interfacing with are also written in it, plus you might find it's style of OOP is easier or more flexible.
In the end, you should go with what you know you can write safe, robust code in and if you find an area that needs special attention from the other language, then you can swap to that. X-Code does allow you to compile both in the same project.
I have a couple of tests I did on an iPhone 3G almost 2 years ago, there was no documentation or hard numbers around in those days. Not sure how valid they still are but the source code is posted and attached.
This isn't a very extensive test, I was mainly interested in NSArray vs C Array for iterating a large number of objects.
http://memo.tv/nsarray_vs_c_array_performance_comparison
http://memo.tv/nsarray_vs_c_array_performance_comparison_part_ii_makeobjectsperformselector
You can see the C Array is much faster at high iterations. Since then I've realized that the bottleneck is probably not the iteration of the NSArray but the sending of the message. I wanted to try methodForSelector and calling the methods directly to see how big the difference would be but never got round to it. According to Mike Ash's benchmarks it's just over 5x faster.
I don't have hard data for Objective C, but I do have a good place to look for C++.
C++ started as C with Classes according to Bjarne Stroustroup in his reflection on the early years of C++ (http://www2.research.att.com/~bs/hopl2.pdf), so C++ can be thought of (like Objective C) as pushing C to its limits for object orientation.
What are those limits? In the 1994-1997 time frame, a lot of researchers figured out that object-orientation came at a cost due to dynamic binding, e.g. when C++ functions are marked virtual and there may/may not be children classes that override these functions. (In Java and C#, all functions expect ctors are inherently virtual, and there isnt' much you can do about it.) In "A Study of Devirtualization Techniques for a Java Just-In-Time Compiler" from researchers at IBM Research Tokyo, they contrast the techniques used to deal with this, including one from Urz Hölzle and Gerald Aigner. Urz Hölzle, in a separate paper with Karel Driesen, had shown that on average 5.7% of time in C++ programs (and up to ~50%) was spent in calling virtual functions (e.g. vtables + thunks). He later worked with some Smalltalk researachers in what ended up the Java HotSpot VM to solve these problems in OO. Some of these features are being backported to C++ (e.g. 'protected' and Exception handling).
As I mentioned, C++ is static typed where Objective C is duck typed. The performance difference in execution (but not lines of code) probably is a result of this difference.
This study says to really get the performance in a CPU intensive game, you have to use C. The linked article is complete with a XCode project that you can run.
I believe the bottom line is: Use Objective-C where you must interact with the iPhone's functions (after all, putting trampolines everywhere can't be good for anyone), but when it comes to loops, things like vector object classes, or intensive array access, stick with C++ STL or C arrays to get good performance.
I mean it would be totally silly to see position = [[Vector3 alloc] init] ;. You're just asking for a performance hit if you use references counts on basic objects like a position vector.
yes. c++ reign supreme in performance/expresiveness/resource tradeoff.
"I'm looking for hard data, not evangelism". google is your best friend.
obj-c nsstring is swapped with c++'s by apple enginneers for performance. in a resource constrained devices, only c++ cuts it as a MAINSTREAM oop language.
NSString stringWithFormat is slow
obj-c oop abstraction is deconstructed into procedural-based c-structs for performance, otherwise a MAGNITUDE order slower than java! the author is also aware of message caching - yet no-go. so modeling lots of small players/enemies objects is done in oop with c++ or else, lots of Procedural structs with a simple OOP wrapper around it with obj-c. there can be one paradigm that equates Procedural + Object-Oriented Programming = obj-c.
http://ejourneyman.wordpress.com/2008/04/23/writing-a-ray-tracer-for-cocoa-objective-c/

C++ for Game Programming - Love or Distrust? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
In the name of efficiency in game programming, some programmers do not trust several C++ features. One of my friends claims to understand how game industry works, and would come up with the following remarks:
Do not use smart pointers. Nobody in games does.
Exceptions should not be (and is usually not) used in game programming for memory and speed.
How much of these statements are true? C++ features have been designed keeping efficiency in mind. Is that efficiency not sufficient for game programming? For 97% of game programming?
The C-way-of-thinking still seems to have a good grasp on the game development community. Is this true?
I watched another video of a talk on multi-core programming in GDC 2009. His talk was almost exclusively oriented towards Cell Programming, where DMA transfer is needed before processing (simple pointer access won't work with the SPE of Cell). He discouraged the use of polymorphism as the pointer has to be "re-based" for DMA transfer. How sad. It is like going back to the square one. I don't know if there is an elegant solution to program C++ polymorphism on the Cell. The topic of DMA transfer is esoteric and I do not have much background here.
I agree that C++ has also not been very nice to programmers who want a small language to hack with, and not read stacks of books. Templates have also scared the hell out of debugging. Do you agree that C++ is too much feared by the gaming community?
The last game I worked on was Heavenly Sword on the PS3 and that was written in C++, even the cell code. Before that, I did some PS2 games and PC games and they were C++ as well. Non of the projects used smart pointers. Not because of any efficiency issues but because they were generally not needed. Games, especially console games, do not do dynamic memory allocation using the standard memory managers during normal play. If there are dynamic objects (missiles, enemies, etc) then they are usually pre-allocated and re-used as required. Each type of object would have an upper limit on the number of instances the game can cope with. These upper limits would be defined by the amount of processing required (too many and the game slows to a crawl) or the amount of RAM present (too much and you could start frequently paging to disk which would seriously degrade performance).
Games generally don't use exceptions because, well, games shouldn't have bugs and therefore not be capable of generating exceptions. This is especially true of console games where games are tested by the console manufacturer, although recent platforms like 360 and PS3 do appear to have a few games that can crash. To be honest, I've not read anything online about what the actual cost of having exceptions enabled is. If the cost is incurred only when an exception is thrown then there is no reason not to use them in games, but I don't know for sure and it's probably dependant on the compiler used. Generally, game programmers know when problems can occur that would be handled using an exception in a business application (things like IO and initialisation) and handle them without the use of exceptions (it is possible!).
But then, in the global scale, C++ is slowly decreasing as a language for game development. Flash and Java probably have a much bigger slice of market and they do have exceptions and smart pointers (in the form of managed objects).
As for the Cell pointer access, the problems arise when the code is being DMA'd into the Cell at an arbitrary base addresses. In this instance, any pointers in the code need to be 'fixed up' with the new base address, this includes v-tables, and you don't really want to do this for every object you load into the Cell. If the code is always loaded at a fixed address, then there is never a need to fix-up the pointers. You lose a bit of flexibility though as you're limiting where code can be stored. On a PC, the code never moves during execution so pointer fix-up at runtime is never needed.
I really don't think anyone 'distrusts' C++ features - not trusting the compiler is something else entirely and quite often new, esoteric architectures like the Cell tend to get robust C compilers before C++ ones because a C compiler is much easier to make than a C++ one.
Look, most everything you hear anyone say about efficiency in programming is magical thinking and superstition. Smart pointers do have a performance cost; especially if you're doing a lot of fancy pointer manipulations in an inner loop, it could make a difference.
Maybe.
But when people say things like that, it's usually the result of someone who told them long ago that X was true, without anything but intuition behind it. Now, the Cell/polymorphism issue sounds plausible — and I bet it did to the first guy who said it. But I haven't verified it.
You'll hear the very same things said about C++ for operating systems: that it is too slow, that it does things you want to do well, badly.
None the less we built OS/400 (from v3r6 forward) entirely in C++, bare-metal on up, and got a code base that was fast, efficient, and small. It took some work; especially working from bare metal, there are some bootstrapping issues, use of placement new, that kind of thing.
C++ can be a problem just because it's too damn big: I'm rereading Stroustrup's wristbreaker right now, and it's pretty intimidating. But I don't think there's anything inherent that says you can't use C++ in an effective way in game programming.
If you or your friend are really paranoid about performance, then go read the Intel manuals on optimization. Fun.
Otherwise, go for correctness, reliability and maintainability every time. I'd rather have a game that ran a bit slowly than one that crashed. If/when you notice that you have performance issues, PROFILE and then optimize. You will likely find that theres some hotspot piece of code which can possibly be made more efficient by using a more efficient data structure or algorithm. Only bother about these silly little mico-optimization when profiling shows that they're the only way you can get a worthwhile speedup.
So:
Write code to be clear and correct
Profile
PROFILE
Can you use more efficient data structures or algorithms to speed up the bottleneck?
Use micro-optimizations as a last resort and only where profiling showed it would help
PS: A lot of modern C++ compilers provide an exception handling mechanism which adds zero execution overhead UNLESS an exception is thrown. That is, performance is only reduced when an exception is actually thrown. As long as exceptions are only used for exceptional circumstances, then theres no good reason not to use them.
I saw a post on StackOverflow (that I cannot seem to find anymore, so maybe it wasn't posted here) that looked at the relative cost of exceptions vs. error codes. Too often people look at "code with exceptions" vs. "code without error handling", which is not a fair comparison. If you would use exceptions, then by not using them you have to use something else for the same functionality, and that other thing is usually error return codes. They found that even in a simple example with a single level of function calls (so no need to propagate exceptions far up the call stack), exceptions were faster than error codes in cases where the error situation occurred 0.1% - 0.01% of the time or less, while error codes were faster in the opposite situation.
Similar to the above complaint about measuring exceptions vs. no error handling, people do this sort of error in reasoning even more often with regard to virtual functions. And just like you don't use exceptions as a way to return dynamic types from a function (yes, I know, all of your code is exceptional), you don't make functions virtual because you like the way it looks in your syntax highlighter. You make functions virtual because you need a particular type of behavior, and so you can't say that virtualization is slow unless you compare it with something that has the same action, and generally the replacement is either lots of switch statements or lots of code duplication. Those have performance and memory hits as well.
As for the comment that games don't have bugs and other software does, all I can say to that is that I clearly have not played any games made by their software company. I've surfed on the floor of the elite 4 in Pokemon, gotten stuck inside of a mountain in Oblivion, been killed by Gloams that accidentally combine their mana damage with their hp damage instead of doing them separately in Diablo II, and pushed myself through a closed gate with a big rock to fight Goblins with a bird and a slingshot in Twilight Princess. Software has bugs. Using exceptions doesn't make bug-free software buggy.
The standard library's exception mechanisms have two types of exceptions: std::runtime_error and std::logic_error. I could see not wanting to use std::logic_error (I've used it as a temporary thing to help me test, with the goal of removing it eventually, and I've also left it in as a permanent check). std::runtime_error, however, is not a bug. I throw an exception derived from std::runtime_error if the server I am connected to sends me invalid data (rule #1 of secure programming: trust no one, even a server that you think you wrote), such as claiming that they are sending me a message of 12 bytes and then they actually send me 15. In such a situation, there are only two possibilities:
1) I am connected to a malicious server, or
2) My connection to the server is corrupted.
In both of these cases, my response is the same: Disconnect (no matter where I am in the code, because my destructors will clean things up for me), wait a couple of seconds, and try connecting to the server again. I cannot do anything else. I could give absolutely everything an error code (which implies passing everything else by reference, which is a performance hit, and severely clutters code), or I could throw an exception that I catch at a point in my code where I determine which servers to connect to (which will probably be very high up in my code).
Is any of what I mentioned a bug in my code? I don't think so; I think it's accepting that all of the other code I have to interface with is imperfect or malicious, and making sure my code remains performant in the face of such ambiguity.
For smart pointers, again, what is the functionality you are trying to implement? If you need the functionality of smart pointers, then not using smart pointers means rewriting their functionality manually. I think it's pretty obvious why this is a bad idea. However, I rarely use smart pointers in my own code. The only time I really do is if I need to store some polymorphic class in a standard container (say, std::map<BattleIds, Battles> where Battles is some base class that is derived from based on the type of battle), in which case I used a std::unique_ptr. I believe that one time I used a std::unique_ptr in a class to work with some library code. Much of the time that I am using std::unique_ptr, it's to make a non-copyable, non-movable type movable. In many cases where you would use a smart pointer, however, it seems like a better idea to just create the object on the stack and remove the pointer from the equation entirely.
In my personal coding, I haven't really found many situations where the "C" version of the code is faster than the "C++" version. In fact, it's generally the opposite. For instance, consider the many examples of std::sort vs. qsort (a common example used by Bjarne Stroustrup) where std::sort clobbers qsort, or my recent comparison of std::copy vs. memcpy, where std::copy actually has a slight performance advantage.
Too much of the "C++ feature X is too slow" claims seem to be based on comparing it to not having the functionality. The most performant (in terms of speed and memory) and bug-free code is int main() {}, but we write programs to do things. If you need particular functionality, it would be silly not to use the features of the language that give you that functionality. However, you should start by thinking of what you want your program to do, and then find the best way to do it. Obviously you don't want to begin with "I want to write a program that uses feature X of C++", you want to begin with "I want to write a program that does cool thing Z" and maybe you end up at "...and the best way to implement that is feature X".
Lots of people make absolute statements about things, because they don't actually think. They'd rather just apply a rule, making things more tedious, but requiring less design and forethought. I'd rather have a bit of hard thinking now and then when I'm doing something hairy, and abstract away the tedium, but I guess not everyone thinks that way. Sure, smart pointers have a performance cost. So do exceptions. That just means there may be some small portions of your code where you shouldn't use them. But you should profile first and make sure that's actually what the problem is.
Disclaimer: I've never done any game programming.
Regarding the Cell architecture: it has an incoherent cache. Each SPE has its own local store of 256 KB. The SPEs can only access this memory; any other memory, such as the 512 MB of main memory or the local store of another SPE, has to be accessed with DMA. You perform the DMA manually and copy the memory into your local store by explicitly initiating a DMA transfer. This makes synchronization a huge pain.
Alternatively, you actually can access other memory. Main memory and each SPE's local store is mapped to a certain section of the 64-bit virtual address space. If you access data through the right pointers, the DMA happens behind the scenes, and it all looks like one giant shared memory space. The problem? Huge performance hit. Every time you access one of these pointers, the SPE stalls while the DMA occurs. This is slow, and it's not something you want to do in performance-critical code (i.e. a game).
This brings us to Skizz's point about vtables and pointer fixups. If you're blindly copying around vtable pointers between SPEs, you're going to incur a huge performance hit if you don't fix up your pointers, and you're also going to incur a huge performance hit if you do fix up your pointers and download the virtual function code to the SPEs.
I ran across an excellent presentation by Sony called "Pitfalls of Object Oriented Programming". This generation of console hardware has really made a number of people take a second look at the OO aspects of C++ and start asking questions about whether it's really the best way forward.
You can find the presentation here (direct link here). Maybe you'll find the example a bit contrived, but hopefully you'll see that this dislike of highly abstracted object oriented designs isn't always based on myth and superstition.
I have written small games in the past with C++ and use C++ currently for other high performance applications. There is no need to use every single C++ feature throughout the whole code base.
Because C++ is (pretty much, minus a few things) a superset of C, you can write C style code where required, while taking advantage of the extra C++ features where appropriate.
Given a decent compiler, C++ can be just as quick as C because you can write "C" code in C++.
And as always, profile the code. Algorithms and memory management generally have a greater impact on performance than using some C++ feature.
Many games also embed Lua or some other scripting language into the game engine, so obviously maximum performance isn't required for every single line of code.
I have never programmed or used a Cell so that may have further restrictions etc.
C++ is not feared by the gaming community. Having worked on an open-world game engine selling millions, I can say the people in the business are extremely skilled and knowledgable.
The fact that shared_ptr isn't used extensively is partly because there is a real cost to it, but more importantly because ownership isn't very clear. Ownership and resource management is one of the most important and hardest things to get right. Partly because resources are still scarce on console, but also since most difficult bugs tend to be related to unclear resource management (e.g. who and what controls the lifetime of an object). IMHO shared_ptr doesn't help with that the least.
There is an added cost to exception handling, which makes it just not worthwhile. In the final game, no exceptions should be thrown anyway - it's better to just crash than to throw an exception. Plus, it's really hard to ensure exception safety in C++ anyway.
But there are many other parts of C++ that are used extensively in the gaming business. Inside EA, EASTL is an amazing remake of STL that is very adapted for high performance and scarce resources.
There is an old saying about Generals being fully prepared to fight the last war not the next.
Something similar is true about most advice on performance. It usually relates to the software and hardware that was availbale five years ago.
Kevin Frei wrote an interesting document, “How much does Exception Handling cost, really?”.
It really depends on the type of game too. If it's a processor-light game (like an asteroids clone) or pretty much anything in 2d, you can get away with more. Sure, smart pointers cost more than regular pointers, but if some people are writing games in C# then smart pointers are definitely not going to be a problem. And exceptions not being used in games is probably true, but many people misuse exceptions anyways. Exceptions should only be used for exceptional circumstances..not expected errors.
I also heard it before I joined the game industry, but something I've found is that the compilers for specialized game hardware are sometimes... subpar. (I've personally only worked with the major consoles, but I'm sure it's even more true for devices like cell phones and the like.) Obviously this isn't really a huge issue if you're developing for PC, where the compilers are tried, true, and abundant in variety, but if you want to develop a game for the Wii, PS3, or X360, guess how many options you have and how well tested they are versus your Windows/Unix compiler of choice.
This isn't to say that the tools are necessarily awful, of course, but they're only guaranteed to work if your code is simple -- in essence, if you program in C. This doesn't mean that you can't use a class or create a RAII-using smart pointer, but the further from that "guaranteed" functionality you get, the shakier the support for the standard becomes. I've personally written a line of code using some templates that compiled for one platform but not on another -- one of them simply didn't support some fringe case in the C++ standard properly.
Some of it is undoubtedly game programmer folklore, but chances are it came from somewhere: Some old compiler unwound the stack strangely when exceptions were thrown, so we don't use exceptions; A certain platform didn't play with templates well, so we only use them in trivial cases; etc. Unfortunately the problem cases and where they occurred never seem to be written down anywhere (and the cases are frequently esoteric and were a pain to track down when they first occurred), so there's no easy way to verify if it's still an issue or not except to try and hope you don't get hurt as a result. Needless to say, this is easier said than done, so the hesitance continues.
Exception handling is never free, despite some claims to the contrary on here. There is ALWAYS a cost whether it be memory or speed. If it has zero performance cost, there will be a high memory cost. Either way, the method used is totally compiler dependant and, therefore, out of the developers control. Neither method is good for game development since a. the target platform has a finite amount of memory that is often never enough and, therefore, we need total control over, and b. a fixed performance constraint of 30/60Hz. It's OK for a PC app or tool to slow down momentarily whilst something gets processed but this is absolutely untolerable on a console game. There are physics and graphics systems etc. that depend on a consistent framerate, so any C++ "feature" that could potentially disrupt this - and cannot be controlled by the developer - is a good candidate for being dropped. If C++ exception handling was so good, with little or no performance/memory cost, it would be used in every program, there wouldn't even be an option to disable it. The fact is, it may be a neat and tidy way to write reliable PC application code but is surplus to requirement in game development. It bulks out the executable, costs memory and/or performance and is totally unoptimizable. This is fine for PC dev that have huge instruction caches and the like, but game consoles do not have this luxury. And even if they did, the game dev community would almost certainly rather spend the extra cycles/memory on game related resources than waste it on features of C++ that we don't need.
Some of this is gaming folklore, and maybe mantras passed down from game developers who were targeting very limited devices (mobile, e.g.) to gamedevs who weren't.
However, a thing to keep in mind about games is that their performance characteristics are dominated by smooth and predictable frame rates. They're not mission-critical software, but they are "FPS-critical" software. A hiccup in frame rates could cause the player to game over in an action game, e.g. As a result, as much as you might find some healthy level of paranoia in mission-critical software about not failing, you can likewise find something similar in gaming about not stuttering and lagging.
A lot of gamedevs I've talked to also don't even like virtual memory and I've seen them try to apply ways to minimize the probability that a page fault could occur at an inconvenient time. In other fields, people might love virtual memory, but games are "FPS-critical". They don't want any kind of weird hiccup or stutter to occur somewhere during gameplay.
So if we start with exceptions, modern implementations of zero-cost EH allow normal execution paths to execute faster than if they were to perform manual branching on error conditions. But they come at the cost that throwing an exception suddenly becomes a much more expensive, "stop the world" kind of event. That kind of "stop the world" thing can be disastrous to a software seeking the most predictable and smooth frame rates. Of course that's only supposed to be reserved for truly exceptional paths, but a game might prefer to just find reasons not to face exceptional paths since the cost of throwing would be too great in the middle of a game. Graceful recovery is kind of a moot concept if the game has a strong desire to be avoiding facing exceptional paths in the first place.
Games often have this kind of "startup and go" characteristic, where they can potentially do all their file loading and memory allocation and things like that which could fail in advance on loading up the level or starting the game instead of doing things that could fail in the middle of the game. As a result they don't necessarily have that many decentralized code paths that could or should encounter an exception and that also diminishes the benefits of EH since it doesn't become so convenient if there are only a select few areas maximum that might benefit from it.
For similar reasons to EH, gamedevs often dislike garbage collection since it can also have that kind of "stop the world" event which can lead to unpredictable stutters -- the briefest of stutters that might be easy to dismiss in many domains as harmless, but not to gamedevs. As a result they might avoid it outright or seek object pooling just to prevent GC collections from occurring at inopportune times.
Avoiding smart pointers outright seems a bit more extreme to me, but a lot of games can preallocate their memory in advance or they might use an entity-component system where every component type is stored in a random-access sequence which allows them to be indexed. Smart pointers imply heap allocations and things that own memory at the granular level of a single object (at least unless you use a custom allocator and custom delete function object), and most games might find it in their best interest to avoid such granular heap allocations and instead allocate many things at once in a large container or through a memory pool.
There might be a bit of superstition here but I think some of it is at least justifiable.

Why doesn't C++ have a garbage collector?

I'm not asking this question because of the merits of garbage collection first of all. My main reason for asking this is that I do know that Bjarne Stroustrup has said that C++ will have a garbage collector at some point in time.
With that said, why hasn't it been added? There are already some garbage collectors for C++. Is this just one of those "easier said than done" type things? Or are there other reasons it hasn't been added (and won't be added in C++11)?
Cross links:
Garbage collectors for C++
Just to clarify, I understand the reasons why C++ didn't have a garbage collector when it was first created. I'm wondering why the collector can't be added in.
Implicit garbage collection could have been added in, but it just didn't make the cut. Probably due to not just implementation complications, but also due to people not being able to come to a general consensus fast enough.
A quote from Bjarne Stroustrup himself:
I had hoped that a garbage collector
which could be optionally enabled
would be part of C++0x, but there were
enough technical problems that I have
to make do with just a detailed
specification of how such a collector
integrates with the rest of the
language, if provided. As is the case
with essentially all C++0x features,
an experimental implementation exists.
There is a good discussion of the topic here.
General overview:
C++ is very powerful and allows you to do almost anything. For this reason it doesn't automatically push many things onto you that might impact performance. Garbage collection can be easily implemented with smart pointers (objects that wrap pointers with a reference count, which auto delete themselves when the reference count reaches 0).
C++ was built with competitors in mind that did not have garbage collection. Efficiency was the main concern that C++ had to fend off criticism from in comparison to C and others.
There are 2 types of garbage collection...
Explicit garbage collection:
C++0x has garbage collection via pointers created with shared_ptr
If you want it you can use it, if you don't want it you aren't forced into using it.
For versions before C++0x, boost:shared_ptr exists and serves the same purpose.
Implicit garbage collection:
It does not have transparent garbage collection though. It will be a focus point for future C++ specs though.
Why Tr1 doesn't have implicit garbage collection?
There are a lot of things that tr1 of C++0x should have had, Bjarne Stroustrup in previous interviews stated that tr1 didn't have as much as he would have liked.
To add to the debate here.
There are known issues with garbage collection, and understanding them helps understanding why there is none in C++.
1. Performance ?
The first complaint is often about performance, but most people don't really realize what they are talking about. As illustrated by Martin Beckett the problem may not be performance per se, but the predictability of performance.
There are currently 2 families of GC that are widely deployed:
Mark-And-Sweep kind
Reference-Counting kind
The Mark And Sweep is faster (less impact on overall performance) but it suffers from a "freeze the world" syndrome: i.e. when the GC kicks in, everything else is stopped until the GC has made its cleanup. If you wish to build a server that answers in a few milliseconds... some transactions will not live up to your expectations :)
The problem of Reference Counting is different: reference-counting adds overhead, especially in Multi-Threading environments because you need to have an atomic count. Furthermore there is the problem of reference cycles so you need a clever algorithm to detect those cycles and eliminate them (generally implement by a "freeze the world" too, though less frequent). In general, as of today, this kind (even though normally more responsive or rather, freezing less often) is slower than the Mark And Sweep.
I have seen a paper by Eiffel implementers that were trying to implement a Reference Counting Garbage Collector that would have a similar global performance to Mark And Sweep without the "Freeze The World" aspect. It required a separate thread for the GC (typical). The algorithm was a bit frightening (at the end) but the paper made a good job of introducing the concepts one at a time and showing the evolution of the algorithm from the "simple" version to the full-fledged one. Recommended reading if only I could put my hands back on the PDF file...
2. Resources Acquisition Is Initialization (RAII)
It's a common idiom in C++ that you will wrap the ownership of resources within an object to ensure that they are properly released. It's mostly used for memory since we don't have garbage collection, but it's also useful nonetheless for many other situations:
locks (multi-thread, file handle, ...)
connections (to a database, another server, ...)
The idea is to properly control the lifetime of the object:
it should be alive as long as you need it
it should be killed when you're done with it
The problem of GC is that if it helps with the former and ultimately guarantees that later... this "ultimate" may not be sufficient. If you release a lock, you'd really like that it be released now, so that it does not block any further calls!
Languages with GC have two work arounds:
don't use GC when stack allocation is sufficient: it's normally for performance issues, but in our case it really helps since the scope defines the lifetime
using construct... but it's explicit (weak) RAII while in C++ RAII is implicit so that the user CANNOT unwittingly make the error (by omitting the using keyword)
3. Smart Pointers
Smart pointers often appear as a silver bullet to handle memory in C++. Often times I have heard: we don't need GC after all, since we have smart pointers.
One could not be more wrong.
Smart pointers do help: auto_ptr and unique_ptr use RAII concepts, extremely useful indeed. They are so simple that you can write them by yourself quite easily.
When one need to share ownership however it gets more difficult: you might share among multiple threads and there are a few subtle issues with the handling of the count. Therefore, one naturally goes toward shared_ptr.
It's great, that's what Boost for after all, but it's not a silver bullet. In fact, the main issue with shared_ptr is that it emulates a GC implemented by Reference Counting but you need to implement the cycle detection all by yourself... Urg
Of course there is this weak_ptr thingy, but I have unfortunately already seen memory leaks despite the use of shared_ptr because of those cycles... and when you are in a Multi Threaded environment, it's extremely difficult to detect!
4. What's the solution ?
There is no silver bullet, but as always, it's definitely feasible. In the absence of GC one need to be clear on ownership:
prefer having a single owner at one given time, if possible
if not, make sure that your class diagram does not have any cycle pertaining to ownership and break them with subtle application of weak_ptr
So indeed, it would be great to have a GC... however it's no trivial issue. And in the mean time, we just need to roll up our sleeves.
What type? should it be optimised for embedded washing machine controllers, cell phones, workstations or supercomputers?
Should it prioritise gui responsiveness or server loading?
should it use lots of memory or lots of CPU?
C/c++ is used in just too many different circumstances.
I suspect something like boost smart pointers will be enough for most users
Edit - Automatic garbage collectors aren't so much a problem of performance (you can always buy more server) it's a question of predicatable performance.
Not knowing when the GC is going to kick in is like employing a narcoleptic airline pilot, most of the time they are great - but when you really need responsiveness!
One of the biggest reasons that C++ doesn't have built in garbage collection is that getting garbage collection to play nice with destructors is really, really hard. As far as I know, nobody really knows how to solve it completely yet. There are alot of issues to deal with:
deterministic lifetimes of objects (reference counting gives you this, but GC doesn't. Although it may not be that big of a deal).
what happens if a destructor throws when the object is being garbage collected? Most languages ignore this exception, since theres really no catch block to be able to transport it to, but this is probably not an acceptable solution for C++.
How to enable/disable it? Naturally it'd probably be a compile time decision but code that is written for GC vs code that is written for NOT GC is going to be very different and probably incompatible. How do you reconcile this?
These are just a few of the problems faced.
Though this is an old question, there's still one problem that I don't see anybody having addressed at all: garbage collection is almost impossible to specify.
In particular, the C++ standard is quite careful to specify the language in terms of externally observable behavior, rather than how the implementation achieves that behavior. In the case of garbage collection, however, there is virtually no externally observable behavior.
The general idea of garbage collection is that it should make a reasonable attempt at assuring that a memory allocation will succeed. Unfortunately, it's essentially impossible to guarantee that any memory allocation will succeed, even if you do have a garbage collector in operation. This is true to some extent in any case, but particularly so in the case of C++, because it's (probably) not possible to use a copying collector (or anything similar) that moves objects in memory during a collection cycle.
If you can't move objects, you can't create a single, contiguous memory space from which to do your allocations -- and that means your heap (or free store, or whatever you prefer to call it) can, and probably will, become fragmented over time. This, in turn, can prevent an allocation from succeeding, even when there's more memory free than the amount being requested.
While it might be possible to come up with some guarantee that says (in essence) that if you repeat exactly the same pattern of allocation repeatedly, and it succeeded the first time, it will continue to succeed on subsequent iterations, provided that the allocated memory became inaccessible between iterations. That's such a weak guarantee it's essentially useless, but I can't see any reasonable hope of strengthening it.
Even so, it's stronger than what has been proposed for C++. The previous proposal [warning: PDF] (that got dropped) didn't guarantee anything at all. In 28 pages of proposal, what you got in the way of externally observable behavior was a single (non-normative) note saying:
[ Note: For garbage collected programs, a high quality hosted implementation should attempt to maximize the amount of unreachable memory it reclaims. —end note ]
At least for me, this raises a serious question about return on investment. We're going to break existing code (nobody's sure exactly how much, but definitely quite a bit), place new requirements on implementations and new restrictions on code, and what we get in return is quite possibly nothing at all?
Even at best, what we get are programs that, based on testing with Java, will probably require around six times as much memory to run at the same speed they do now. Worse, garbage collection was part of Java from the beginning -- C++ places enough more restrictions on the garbage collector that it will almost certainly have an even worse cost/benefit ratio (even if we go beyond what the proposal guaranteed and assume there would be some benefit).
I'd summarize the situation mathematically: this a complex situation. As any mathematician knows, a complex number has two parts: real and imaginary. It appears to me that what we have here are costs that are real, but benefits that are (at least mostly) imaginary.
If you want automatic garbage collection, there are good commercial
and public-domain garbage collectors for C++. For applications where
garbage collection is suitable, C++ is an excellent garbage collected
language with a performance that compares favorably with other garbage
collected languages. See The C++ Programming Language (4rd
Edition) for a discussion of automatic garbage collection in C++.
See also, Hans-J. Boehm's site for C and C++ garbage collection (archive).
Also, C++ supports programming techniques that allow memory
management to be safe and implicit without a garbage collector. I consider garbage collection a last choice and an imperfect way of handling for resource management. That does not mean that it is never useful, just that there are better approaches in many situations.
Source: http://www.stroustrup.com/bs_faq.html#garbage-collection
As for why it doesnt have it built in, If I remember correctly it was invented before GC was the thing, and I don't believe the language could have had GC for several reasons(I.E Backwards compatibilty with C)
Hope this helps.
tl;dr: Because modern C++ doesn't need garbage collection.
Bjarne Stroustrup's FAQ answer on this matter says:
I don't like garbage. I don't like littering. My ideal is to eliminate the need for a garbage collector by not producing any garbage. That is now possible.
The situation, for code written these days (C++17 and following the official Core Guidelines) is as follows:
Most memory ownership-related code is in libraries (especially those providing containers).
Most use of code involving memory ownership follows the CADRe or RAII pattern, so allocation is made on construction and deallocation on destruction, which happens when exiting the scope in which something was allocated.
You do not explicitly allocate or deallocate memory directly.
Raw pointers do not own memory (if you've followed the guidelines), so you can't leak by passing them around.
If you're wondering how you're going to pass the starting addresses of sequences of values in memory - you can and should prefer span's, obviating the need for raw pointers. You can still use such pointers, they'll just be non-owning.
If you really need an owning "pointer", you use C++' standard-library smart pointers - they can't leak, and are decently efficient (although the ABI can get in the way of that). Alternatively, you can pass ownership across scope boundaries with "owner pointers". These are uncommon and must be used explicitly; but when adopted - they allow for nice static checking against leaks.
"Oh yeah? But what about...
... if I just write code the way we used to write C++ in the old days?"
Indeed, you could just disregard all of the guidelines and write leaky application code - and it will compile and run (and leak), same as always.
But it's not a "just don't do that" situation, where the developer is expected to be virtuous and exercise a lot of self control; it's just not simpler to write non-conforming code, nor is it faster to write, nor is it better-performing. Gradually it will also become more difficult to write, as you would face an increasing "impedance mismatch" with what conforming code provides and expects.
... if I reintrepret_cast? Or do complex pointer arithmetic? Or other such hacks?"
Indeed, if you put your mind to it, you can write code that messes things up despite playing nice with the guidelines. But:
You would do this rarely (in terms of places in the code, not necessarily in terms of fraction of execution time)
You would only do this intentionally, not accidentally.
Doing so will stand out in a codebase conforming to the guidelines.
It's the kind of code in which you would bypass the GC in another language anyway.
... library development?"
If you're a C++ library developer then you do write unsafe code involving raw pointers, and you are required to code carefully and responsibly - but these are self-contained pieces of code written by experts (and more importantly, reviewed by experts).
So, it's just like Bjarne said: There's really no motivation to collect garbage generally, as you all but make sure not to produce garbage. GC is becoming a non-problem with C++.
That is not to say GC isn't an interesting problem for certain specific applications, when you want to employ custom allocation and de-allocations strategies. For those you would want custom allocation and de-allocation, not a language-level GC.
Stroustrup made some good comments on this at the 2013 Going Native conference.
Just skip to about 25m50s in this video. (I'd recommend watching the whole video actually, but this skips to the stuff about garbage collection.)
When you have a really great language that makes it easy (and safe, and predictable, and easy-to-read, and easy-to-teach) to deal with objects and values in a direct way, avoiding (explicit) use of the heap, then you don't even want garbage collection.
With modern C++, and the stuff we have in C++11, garbage collection is no longer desirable except in limited circumstances. In fact, even if a good garbage collector is built into one of the major C++ compilers, I think that it won't be used very often. It will be easier, not harder, to avoid the GC.
He shows this example:
void f(int n, int x) {
Gadget *p = new Gadget{n};
if(x<100) throw SomeException{};
if(x<200) return;
delete p;
}
This is unsafe in C++. But it's also unsafe in Java! In C++, if the function returns early, the delete will never be called. But if you had full garbage collection, such as in Java, you merely get a suggestion that the object will be destructed "at some point in the future" (Update: it's even worse that this. Java does not promise to call the finalizer ever - it maybe never be called). This isn't good enough if Gadget holds an open file handle, or a connection to a database, or data which you have buffered for write to a database at a later point. We want the Gadget to be destroyed as soon as it's finished, in order to free these resources as soon as possible. You don't want your database server struggling with thousands of database connections that are no longer needed - it doesn't know that your program is finished working.
So what's the solution? There are a few approaches. The obvious approach, which you'll use for the vast majority of your objects is:
void f(int n, int x) {
Gadget p = {n}; // Just leave it on the stack (where it belongs!)
if(x<100) throw SomeException{};
if(x<200) return;
}
This takes fewer characters to type. It doesn't have new getting in the way. It doesn't require you to type Gadget twice. The object is destroyed at the end of the function. If this is what you want, this is very intuitive. Gadgets behave the same as int or double. Predictable, easy-to-read, easy-to-teach. Everything is a 'value'. Sometimes a big value, but values are easier to teach because you don't have this 'action at a distance' thing that you get with pointers (or references).
Most of the objects you make are for use only in the function that created them, and perhaps passed as inputs to child functions. The programmer shouldn't have to think about 'memory management' when returning objects, or otherwise sharing objects across widely separated parts of the software.
Scope and lifetime are important. Most of the time, it's easier if the lifetime is the same as the scope. It's easier to understand and easier to teach. When you want a different lifetime, it should be obvious reading the code that you're doing this, by use of shared_ptr for example. (Or returning (large) objects by value, leveraging move-semantics or unique_ptr.
This might seem like an efficiency problem. What if I want to return a Gadget from foo()? C++11's move semantics make it easier to return big objects. Just write Gadget foo() { ... } and it will just work, and work quickly. You don't need to mess with && yourself, just return things by value and the language will often be able to do the necessary optimizations. (Even before C++03, compilers did a remarkably good job at avoiding unnecessary copying.)
As Stroustrup said elsewhere in the video (paraphrasing): "Only a computer scientist would insist on copying an object, and then destroying the original. (audience laughs). Why not just move the object directly to the new location? This is what humans (not computer scientists) expect."
When you can guarantee only one copy of an object is needed, it's much easier to understand the lifetime of the object. You can pick what lifetime policy you want, and garbage collection is there if you want. But when you understand the benefits of the other approaches, you'll find that garbage collection is at the bottom of your list of preferences.
If that doesn't work for you, you can use unique_ptr, or failing that, shared_ptr. Well written C++11 is shorter, easier-to-read, and easier-to-teach than many other languages when it comes to memory management.
The idea behind C++ was that you would not pay any performance impact for features that you don't use. So adding garbage collection would have meant having some programs run straight on the hardware the way C does and some within some sort of runtime virtual machine.
Nothing prevents you from using some form of smart pointers that are bound to some third-party garbage collection mechanism. I seem to recall Microsoft doing something like that with COM and it didn't go to well.
To answer most "why" questions about C++, read Design and Evolution of C++
One of the fundamental principles behind the original C language is that memory is composed of a sequence of bytes, and code need only care about what those bytes mean at the exact moment that they are being used. Modern C allows compilers to impose additional restrictions, but C includes--and C++ retains--the ability to decompose a pointer into a sequence of bytes, assemble any sequence of bytes containing the same values into a pointer, and then use that pointer to access the earlier object.
While that ability can be useful--or even indispensable--in some kinds of applications, a language that includes that ability will be very limited in its ability to support any kind of useful and reliable garbage collection. If a compiler doesn't know everything that has been done with the bits that made up a pointer, it will have no way of knowing whether information sufficient to reconstruct the pointer might exist somewhere in the universe. Since it would be possible for that information to be stored in ways that the computer wouldn't be able to access even if it knew about them (e.g. the bytes making up the pointer might have been shown on the screen long enough for someone to write them down on a piece of paper), it may be literally impossible for a computer to know whether a pointer could possibly be used in the future.
An interesting quirk of many garbage-collected frameworks is that an object reference not defined by the bit patterns contained therein, but by the relationship between the bits held in the object reference and other information held elsewhere. In C and C++, if the bit pattern stored in a pointer identifies an object, that bit pattern will identify that object until the object is explicitly destroyed. In a typical GC system, an object may be represented by a bit pattern 0x1234ABCD at one moment in time, but the next GC cycle might replace all references to 0x1234ABCD with references to 0x4321BABE, whereupon the object would be represented by the latter pattern. Even if one were to display the bit pattern associated with an object reference and then later read it back from the keyboard, there would be no expectation that the same bit pattern would be usable to identify the same object (or any object).
SHORT ANSWER:
We don't know how to do garbage collection efficiently (with minor time and space overhead) and correctly all the time (in all possible cases).
LONG ANSWER:
Just like C, C++ is a systems language; this means it is used when you are writing system code, e.g., operating system. In other words, C++ is designed, just like C, with best possible performance as the main target. The language' standard will not add any feature that might hinder the performance objective.
This pauses the question: Why garbage collection hinders performance? The main reason is that, when it comes to implementation, we [computer scientists] do not know how to do garbage collection with minimal overhead, for all cases. Hence it's impossible to the C++ compiler and runtime system to perform garbage collection efficiently all the time. On the other hand, a C++ programmer, should know his design/implementation and he's the best person to decide how to best do the garbage collection.
Last, if control (hardware, details, etc.) and performance (time, space, power, etc.) are not the main constraints, then C++ is not the right tool. Other language might serve better and offer more [hidden] runtime management, with the necessary overhead.
All the technical talking is overcomplicating the concept.
If you put GC into C++ for all the memory automatically then consider something like a web browser. The web browser must load a full web document AND run web scripts. You can store web script variables in the document tree. In a BIG document in a browser with lots of tabs open, it means that every time the GC must do a full collection it must also scan all the document elements.
On most computers this means that PAGE FAULTS will occur. So the main reason, to answer the question is that PAGE FAULTS will occur. You will know this as when your PC starts making lots of disk access. This is because the GC must touch lots of memory in order to prove invalid pointers. When you have a bona fide application using lots of memory, having to scan all objects every collection is havoc because of the PAGE FAULTS. A page fault is when virtual memory needs to get read back into RAM from disk.
So the correct solution is to divide an application into the parts that need GC and the parts that do not. In the case of the web browser example above, if the document tree was allocated with malloc, but the javascript ran with GC, then every time the GC kicks in it only scans a small portion of memory and all PAGED OUT elements of the memory for the document tree does not need to get paged back in.
To further understand this problem, look up on virtual memory and how it is implemented in computers. It is all about the fact that 2GB is available to the program when there is not really that much RAM. On modern computers with 2GB RAM for a 32BIt system it is not such a problem provided only one program is running.
As an additional example, consider a full collection that must trace all objects. First you must scan all objects reachable via roots. Second scan all the objects visible in step 1. Then scan waiting destructors. Then go to all the pages again and switch off all invisible objects. This means that many pages might get swapped out and back in multiple times.
So my answer to bring it short is that the number of PAGE FAULTS which occur as a result of touching all the memory causes full GC for all objects in a program to be unfeasible and so the programmer must view GC as an aid for things like scripts and database work, but do normal things with manual memory management.
And the other very important reason of course is global variables. In order for the collector to know that a global variable pointer is in the GC it would require specific keywords, and thus existing C++ code would not work.
When we compare C++ with Java, we see that C++ was not designed with implicit Garbage Collection in mind, while Java was.
Having things like arbitrary pointers in C-Style is not only bad for GC-implementations, but it would also destroy backward compatibility for a large amount of C++-legacy-code.
In addition to that, C++ is a language that is intended to run as standalone executable instead of having a complex run-time environment.
All in all:
Yes it might be possible to add Garbage Collection to C++, but for the sake of continuity it is better not to do so.
Mainly for two reasons:
Because it doesn't need one (IMHO)
Because it's pretty much incompatible with RAII, which is the cornerstone of C++
C++ already offers manual memory management, stack allocation, RAII, containers, automatic pointers, smart pointers... That should be enough. Garbage collectors are for lazy programmers who don't want to spend 5 minutes thinking about who should own which objects or when should resources be freed. That's not how we do things in C++.
Imposing garbage collection is really a low level to high level paradigm shift.
If you look at the way strings are handled in a language with garbage collection, you will find they ONLY allow high level string manipulation functions and do not allow binary access to the strings. Simply put, all string functions first check the pointers to see where the string is, even if you are only drawing out a byte. So if you are doing a loop that processes each byte in a string in a language with garbage collection, it must compute the base location plus offset for each iteration, because it cannot know when the string has moved. Then you have to think about heaps, stacks, threads, etc etc.