Regarding mark-sweep ( lazy approach ) for garbage collection in C++? - c++

I know reference counter technique but never heard of mark-sweep technique until today, when reading the book named "Concepts of programming language".
According to the book:
The original mark-sweep process of garbage collection operates as follow: The runtime system allocates storage cells as requested and disconnects pointers from cells as necessary, without regard of storage reclamation ( allowing garbage to accumulate), until it has allocated all available cells. At this point, a mark-sweep process is begun to gather all the garbage left floating-around in the heap. To facilitate the process, every heap cells has an extra indicator bit or field that is used by the collection algorithm.
From my limited understanding, smart-pointers in C++ libraries use reference counting technique. I wonder is there any library in C++ using this kind of implementation for smart-pointers? And since the book is purely theoretical, I could not visualize how the implementation is done. An example to demonstrate this idea would be greatly valuable. Please correct me if I'm wrong.
Thanks,

There is one difficulty to using garbage collection in C++, it's to identify what is pointer and what is not.
If you can tweak a compiler to provide this information for each and every object type, then you're done, but if you cannot, then you need to use conservative approach: that is scanning the memory searching for any pattern that may look like a pointer. There is also the difficulty of "bit stuffing" here, where people stuff bits into pointers (the higher bits are mostly unused in 64 bits) or XOR two different pointers to "save space".
Now, in C++0x the Standard Committee introduced a standard ABI to help implementing Garbage Collection. In n3225 you can find it at 20.9.11 Pointer safety [util.dynamic.safety]. This supposes that people will implement those functions for their types, of course:
void declare_reachable(void* p); // throw std::bad_alloc
template <typename T> T* undeclare_reachable(T* p) noexcept;
void declare_no_pointers(char* p, size_t n) noexcept;
void undeclare_no_pointers(char* p, size_t n) noexcept;
pointer_safety get_pointer_safety() noexcept;
When implemented, it will authorize you to plug any garbage collection scheme (defining those functions) into your application. It will of course require some work of course to actually provide those operations wherever they are needed. One solution could be to simply override new and delete but it does not account for pointer arithmetic...
Finally, there are many strategies for Garbage Collection: Reference Counting (with Cycle Detection algorithms) and Mark And Sweep are the main different systems, but they come in various flavors (Generational or not, Copying/Compacting or not, ...).

Although they may have upgraded it by now, Mozilla Firefox used to use a hybrid approach in which reference-counted smart pointers were used when possible, with a mark-and-sweep garbage collector running in parallel to clean up reference cycles. It's possible other projects have adopted this approach, though I'm not fully sure.
The main reason that I could see C++ programmers avoiding this type of garbage collection is that it means that object destructors would run asynchronously. This means that if any objects were created that held on to important resources, such as network connections or physical hardware, the cleanup wouldn't be guaranteed to occur in a timely fashion. Moreover, the destructors would have to be very careful to use appropriate synchronization if they were to access shared resources, while in a single-threaded, straight reference-counting solution this wouldn't be necessary.
The other complexity of this approach is that C++ allows for raw arithmetic operations on pointers, which greatly complicates the implementation of any garbage collector. It's possible to conservatively solve this problem (look at the Boehm GC, for example), though it's a significant barrier to building a system of this sort.

Related

C++ std features and Binary size

I was told recently in a job interview their project works on building the smallest size binary for their application (runs embedded) so I would not be able to use things such as templating or smart pointers as these would increase the binary size, they generally seemed to imply using things from std would be generally a no go (not all cases).
After the interview, I tried to do research online about coding and what features from standard lib caused large binary sizes and I could find basically nothing in regards to this. Is there a way to quantify using certain features and the size impact they would have (without needing to code 100 smart pointers in a code base vs self managed for example).
This question probably deserves more attention than it’s likely to get, especially for people trying to pursue a career in embedded systems. So far the discussion has gone about the way that I would expect, specifically a lot of conversation about the nuances of exactly how and when a project built with C++ might be more bloated than one written in plain C or a restricted C++ subset.
This is also why you can’t find a definitive answer from a good old fashioned google search. Because if you just ask the question “is C++ more bloated than X?”, the answer is always going to be “it depends.”
So let me approach this from a slightly different angle. I’ve both worked for, and interviewed at companies that enforced these kinds of restrictions, I’ve even voluntarily enforced them myself. It really comes down to this. When you’re running an engineering organization with more than one person with plans to keep hiring, it is wildly impractical to assume everyone on your team is going to fully understand the implications of using every feature of a language. Coding standards and language restrictions serve as a cheap way to prevent people from doing “bad things” without knowing they’re doing “bad things”.
How you define a “bad thing” is then also context specific. On a desktop platform, using lots of code space isn’t really a “bad” enough thing to rigorously enforce. On a tiny embedded system, it probably is.
C++ by design makes it very easy for an engineer to generate lots of code without having to type it out explicitly. I think that statement is pretty self-evident, it’s the whole point of meta-programming, and I doubt anyone would challenge it, in fact it’s one of the strengths of the language.
So then coming back to the organizational challenges, if your primary optimization variable is code space, you probably don’t want to allow people to use features that make it trivial to generate code that isn’t obvious. Some people will use that feature responsibly and some people won’t, but you have to standardize around the least common denominator. A C compiler is very simple. Yes you can write bloated code with it, but if you do, it will probably be pretty obvious from looking at it.
(Partially extracted from comments I wrote earlier)
I don't think there is a comprehensive answer. A lot also depends on the specific use case and needs to be judged on a case-by-case basis.
Templates
Templates may result in code bloat, yes, but they can also avoid it. If your alternative is introducing indirection through function pointers or virtual methods, then the templated function itself may become bigger in code size simply because function calls take several instructions and removes optimization potential.
Another aspect where they can at least not hurt is when used in conjunction with type erasure. The idea here is to write generic code, then put a small template wrapper around it that only provides type safety but does not actually emit any new code. Qt's QList is an example that does this to some extend.
This bare-bones vector type shows what I mean:
class VectorBase
{
protected:
void** start, *end, *capacity;
void push_back(void*);
void* at(std::size_t i);
void clear(void (*cleanup_function)(void*));
};
template<class T>
class Vector: public VectorBase
{
public:
void push_back(T* value)
{ this->VectorBase::push_back(value); }
T* at(std::size_t i)
{ return static_cast<T*>(this->VectorBase::at(i)); }
~Vector()
{ clear(+[](void* object) { delete static_cast<T*>(object); }); }
};
By carefully moving as much code as possible into the non-templated base, the template itself can focus on type-safety and to provide necessary indirections without emitting any code that wouldn't have been here anyway.
(Note: This is just meant as a demonstration of type erasure, not an actually good vector type)
Smart pointers
When written carefully, they won't generate much code that wouldn't be there anyway. Whether an inline function generates a delete statement or the programmer does it manually doesn't really matter.
The main issue that I see with those is that the programmer is better at reasoning about code and avoiding dead code. For example even after a unique_ptr has been moved away, the destructor of the pointer still has to emit code. A programmer knows that the value is NULL, the compiler often doesn't.
Another issue comes up with calling conventions. Objects with destructors are usually passed on the stack, even if you declare them pass-by-value. Same for return values. So a function unique_ptr<foo> bar(unique_ptr<foo> baz) will have higher overhead than foo* bar(foo* baz) simply because pointers have to be put on and off the stack.
Even more egregiously, the calling convention used for example on Linux makes the caller clean up parameters instead of the callee. That means if a function accepts a complex object like a smart pointer by value, a call to the destructor for that parameter is replicated at every call site, instead of putting it once inside the function. Especially with unique_ptr this is so stupid because the function itself may know that the object has been moved away and the destructor is superfluous; but the caller doesn't know this (unless you have LTO).
Shared pointers are a different beast altogether, simply because they allow a lot of different tradeoffs. Should they be atomic? Should they allow type casting, weak pointers, what indirection is used for destruction? Do you really need two raw pointers per shared pointer or can the reference counter be accessed through shared object?
Exceptions, RTTI
Generally avoided and removed via compiler flags.
Library components
On a bare-metal system, pulling in parts of the standard library can have a significant effect that can only be measured after the linker step. I suggest any such project use continuous integration and tracks the code size as a metric.
For example I once added a small feature, I don't remember which, and in its error handling it used std::stringstream. That pulled in the entire iostream library. The resulting code exceeded my entire RAM and ROM capacity. IIRC the issue was that even though exception handling was deactivated, the exception message was still being set up.
Move constructors and destructors
It's a shame that C++'s move semantics aren't the same as for example Rust's where objects can be moved with a simple memcpy and then "forgetting" their original location. In C++ the destructor for a moved object is still invoked, which requires more code in the move constructor / move assignment operator, and in the destructor.
Qt for example accounts for such simple cases in its meta type system.

Pointers in C++ : How large should an object be to need use of a pointer?

Often times I read in literature explaining that one of the use case of C++ pointers is when one has big objects to deal with, but how large should an object be to need a pointer when being manipulated? Is there any guiding principle in this regard?
I don't think size is the main factor to consider.
Pointers (or references) are a way to designate a single bunch of data (be it an object, a function or a collection of untyped bytes) from different locations.
If you do copies instead of using pointers, you run the risk of having two separate versions of the same data becoming inconsistent with each other. If the two copies are meant to represent a single piece of information, then you will have to do twice the work to make sure they stay consistent.
So in some cases using a pointer to reference even a single byte could be the right thing to do, even though storing copies of the said byte would be more efficient in terms of memory usage.
EDIT: to answer jogojapan remarks, here is my opinion on memory efficiency
I often ran programs through profilers and discovered that an amazing percentage of the CPU power went into various forms of memory-to-memory copies.
I also noticed that the cost of optimizing memory efficiency was often offset by code complexity, for surprisingly little gains.
On the other hand, I spent many hours tracing bugs down to data inconsistencies, some of them requiring sizeable code refactoring to get rid of.
As I see it, memory efficiency should become more of a concern near the end of a project, when profiling reveals where the CPU/memory drain really occurs, while code robustness (especially data flows and data consistency) should be the main factor to consider in the early stages of conception and coding.
Only the bulkiest data types should be dimensionned at the start, if the application is expected to handle considerable amounts of data. In a modern PC, we are talking about hundreds of megabytes, which most applications will never need.
As I designed embedded software 10 or 20 years ago, memory usage was a constant concern. But in environments like a desktop PC where memory requirements are most of the time neglectible compared to the amount of available RAM, focusing on a reliable design seems more of a priority to me.
You should use a pointer when you want to refer to the same object at different places. In fact you can even use references for the same but pointers give you the added advantage of being able to refer different objects while references keep referring the same object.
On a second thought maybe you are referring to objects created on freestore using new etc and then referring them through pointers. There is no definitive rule for that but in general you can do so when:
Object being created is too large to be accommodated on stack or
You want to increase the lifetime of the object beyond the scope etc.
There is no such limitation or guideline. You will have to decide it.
Assume class definition below. Size is 100 ints = 400 bytes.
class test
{
private:
int m_nVar[100];
};
When you use following function definition(passed by value), copy constructor will get called (even if you don't provide one). So copying of 100 ints will happen which will obviously take some time to finish
void passing_to_function(test a);
When you change definition of function to reference or pointer, there is no such copying will happen. Just transfer of test* (only pointer size)
void passing_to_function(test& a);
So you obviously have advantage by passing by ref or passing by ptr than passing by value!

Static arrays VS. dynamic arrays in C++11

I know that it's a very old debate that has already been discussed many times all over the world. But I'm currently having troubles deciding which method I should use rather than another between static and dynamic arrays in a particular case. Actually, I woudn't have used C++11, I would have used static arrays. But I'm now confused since there could be equivalent benefits with both.
First solution:
template<size_t N>
class Foo
{
private:
int array[N];
public:
// Some functions
}
Second solution:
template<size_t N>
class Foo
{
private:
int* array;
public:
// Some functions
}
I can't happen to chose since the two have their own advantages:
Static arrays are faster, and we don't care about memory managment at all.
Dynamic arrays do not weigth anything as long as memory is not allocated. After that, they are less handy to use than static arrays. But since C++11, we can have great benefits from move semantics, which we can not use with static arrays.
I don't think there is one good solution, but I would like to get some advice or just to know what you think of all that.
I will actually disagree with the "it depends". Never use option 2. If you want to use a translationtime constant, always use option 1 or std::array. The one advantage you listed, that dynamic arrays weigh nothing until allocated, is actually a horrible, huge disadvantage, and one that needs to be pointed out with great emphasis.
Do not ever have objects that have more than one phase of construction. Never, ever. That should be a rule committed to memory through some large tattoo. Just never do it.
When you have zombies objects that are not quite alive yet, though not quite dead either, the complexity in managing their lifetime grows exponentially. You have to check in every method whether it is fully alive, or only pretending to be alive. Exception safety requires special cases in your destructor. Instead of one simple construction and automatic destruction, you've now added requirements that must be checked at N different places (# methods + dtor). And the compiler doesn't care if you check. And other engineers won't have this requirement broadcast, so they may adjust your code in unsafe ways, using variables without checking. And now all these methods have multiple behaviors depending on the state of the object, so every user of the object needs to know what to expect. Zombies will ruin your (coding) life.
Instead, if you have two different natural lifetimes in your program, use two different objects. But that means you have two different states in your program, so you should have a state machine, with one state having just one object and another state with both, separated by an asynchronous event. If there is no asynchronous event between the two points, if they all fit in one function scope, then the separation is artifical and you should be doing single phase construction.
The only case where a translation time size should translate to a dynamic allocation is when the size is too large for the stack. This then gets to memory optimisation, and it should always be evaluated using memory and profiling tools to see what's best. Option 2 will never be best (it uses a naked pointer - so again we lose RAII and any automatic cleanup and management, adding invariants and making the code more complex and easily breakable by others). Vector (as suggested by bitmask) would be the appropriate first thought, though you may not like the heap allocation costs in time. Other options might be static space in your application's image. But again, these should only be considered once you've determined that you have a memory constraint and what to do from there should be determined by actual measurable needs.
Use neither. You're better off using std::vector in nearly any case. In the other cases, that heavily depends on the reason why std::vector would be insufficient and hence cannot be answered generally!
I'm currently having a problem to decide which one I should use more than another in a particular case.
You'll need to consider your options case-by-case to determine the optimal solution for the given context -- that is, a generalization cannot be made. If one container were ideal for every scenario, the other would be obsolete.
As mentioned already, consider using std implementations before writing your own.
More details:
Fixed Length
Be careful of how much of the stack you consume.
May consume more memory, if you treat it as a dynamically sized container.
Fast copies.
Variable Length
Reallocation and resizing can be costly.
May consume more memory than needed.
Fast moves.
The better choice also requires you understand the complexity of creation, copy, assign, etc. of the element types.
And if you do use std implementations, remember that implementations may vary.
Finally, you can create a container for these types which abstract the implementation details and dynamically select an appropriate data member based on the size and context -- abstracting the detail behind a general interface. This is also useful at times to disable features, or to make some operations (e.g. costly copies) more obvious.
In short, you need to know a lot about the types and usage, and measure several aspects of your program to determine the optimal container type for a specific scenario.

Understanding the library functions in c++ [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 13 years ago.
If I'd like to know how a function written in like standard C++ library work (not just the MSDN description). I mean how does it allocate, manage, deallocate memory and return you the result. where or what do you need to know to understand that?
You can look at the library headers. A lot of functionality is actually implemented there because the library is highly templatized (and templates generally need to be implemented in headers). The location of the headers depends on the compiler, but you should be able to find them quite easily (e.g. search for a file named algorithm).
You may also ask the compiler to preprocess your code to see all the related headers (this will produce extremely long output). With GCC you can do this by g++ -E yoursource.cc.
If what you are looking for isn't implemented in headers, you need the library sources, which are generally not installed by default and which are not even available for commercial compilers such as MSVC. Look for glibc (C library) and libstdc++ (C++ library), which are the ones used by GCC and some other compilers.
In any case, notice that the standard library implementations tend to be rather cryptic due to a lot of underscores being used in variable names and such (to avoid name collisions with user's macros), and often they are also infested with #ifdefs and other preprocessor cruft.
You need to know the techniques used to write C++ libraries. Getting Bjarne Stroustrup's book is a good start. Also, SGI has very detailed documentation on the STL at a suitably high level of abstraction.
If you are going to be investigating the windows based stuff you might want to study the systems part of the windows library.
To complement windows: understanding the Posix specification is also important.
First a few basic data-structure principles, then a note and some links about allocators...
The STL containers use a number of different data structures. The map, set, multimap and multiset are normally implemented as binary trees with red-black balancing rules, for example, and deque is possibly (more impression than knowledge) a circular queue in an array, exploiting an array-doubling or similar growth pattern.
None of the data structures are actually defined by the standard - but the specified performance characteristics limit the choices significantly.
Normally, your contained data is contained directly in the data structure nodes, which are held (by default) in heap allocated memory. You can override the source of memory for nodes by providing an allocator template parameter when you specify the container - more on that later. If you need the container nodes to reference (not contain) your items, specify a pointer or smart-pointer type as the contained type.
For example, in an std::set, the nodes will be binary tree nodes with space in them for an int and the two child pointers, and the metadata that the library needs (e.g. the red/black flag). The binary tree node will not move around your applications address-space, so you can store pointers to your data item elsewhere if you want, but that isn't true for all containers - e.g. an insert in a vector moves all items above the insert point up by one, and may have to reallocate the whole vector, moving all items.
The container class instance is normally very small - a few pointers is typical. For example, the std::set etc usually have a root pointer, a pointer to the lowest-key node and a pointer to the highest-key node, and probably a bit more metadata.
One issue the STL faces is creating and destroying instances in multi-item nodes without creating/destroying the node. This happens in std::vector and std::deque, for instance. I don't know, strictly, how the STL does it - but the obvious approach requires placement new and explicit destructor calls.
Placement new allows you to create an object in an already-allocated piece of memory. It basically calls the constructor for you. It can take parameters, so it can call a copy constructor or other constructor, not just the default constructor.
http://www.devx.com/tips/Tip/12582
To destruct, you literally call the destructor explicitly, via a (correctly typed) pointer.
((mytype*) (void*) x)->~mytype ();
This works if you haven't declared an explicit constructor, and even for built-in types like "int" that don't need destructing.
Likewise, to assign from one constructed instance to another, you make an explicit call to operator=.
Basically, the containers are able to create, copy and destroy data within an existing node fairly easily, and where needed, metadata tracks which items are currently constructed in the node - e.g. size() indicates which items are currently constructed in an std::vector - there may be additional non-constructed items, depending on the current capacity().
EDIT - It's possible that the STL can optimise by using (directly, or in effect) std::swap rather than operator= to move data around. This would be good where the data items are (for example) other STL containers, and thus own lots of referenced data - swapping could avoid lots of copying. I don't know if the standard requires this, or allows but doesn't mandate it. There is a well-known mechanism for doing this kind of thing, though, using a "traits" template. The default "traits" can provide an assignment-using method whereas specific overrides may support special-case types by using a swapping method. The abstraction would be a move where you don't care what is left in the source (original data, data from target, whatever) as long as it's valid and destructible.
In binary tree nodes, of course, there should be no need for this as there is only one item per node and it's always constructed.
The remaining problem is how to reserve correctly-aligned and correctly-sized space within a node struct to hold an unknown type (specified as a template parameter) without getting unwanted constructor/destructor calls when you create/destroy the node. This will get easier in C++0x, since a union will be able to hold non-POD types, giving a convenient uninitialised-space type. Until then, there's a range of tricks that more-or-less work with different degrees of portability, and no doubt a good STL implementation is a good example to learn from.
Personally, my containers use a space-for-type template class. It uses compiler-specific allocation checks to determine the alignment at compile-time and some template trickery to choose from an array-of-chars, array-of-shorts, array-of-longs etc of the correct size. The non-portable alignment-checking tricks are selected using "#if defined" etc, and the template will fail (at compile time) when someone throws a 128-bit alignment requirement at it because I haven't allowed for that yet.
How to actually allocate the nodes? Well, most (all?) STL containers take an "Allocator" parameter, which is defaulted to "allocator". That standard implementation gets memory from and releases it to the heap. Implement the right interface and it can be replaced with a custom allocator.
Doing that is something I don't like to do, and certainly not without Stroustrups "The C++ Programming Language" on my desk. There's a lot of requirements to meet in your allocator class, and at least in the past (things may have improved), compiler error messages were not helpful.
Google says you could look here, though...
http://www2.roguewave.com/support/docs/leif/sourcepro/html/toolsug/12-6.html
http://en.wikipedia.org/wiki/Allocator_%28C%2B%2B%29
Operating system functions to allocate/free memory are not really relevant to the C++ standard library.
The standard library containers will (by default) use new and delete for memory, and that uses a compiler-specific runtime which almost certainly manages its own heap data structure. This approach is generally more appropriate for typical applications use, where the platform-specific operating system heap is usually more appropriate for allocating large blocks.
The application heap will allocate/free memory from the operating system heap, but "how?" and "when?" are platform-specific and compiler-specific details.
For the Win32 memory management APIs, look here...
http://msdn.microsoft.com/en-us/library/ms810603.aspx
I'm sure you can find win64 equivalents if needed.
I haven't this book, but according to its description, http://www.amazon.com/C-Standard-Template-Library/dp/0134376331 includes
-Practical techniques for using and implementing the component
Isn't this what you want?

How to implement thread safe reference counting in C++

How do you implement an efficient and thread safe reference counting system on X86 CPUs in the C++ programming language?
I always run into the problem that the critical operations not atomic, and the available X86 Interlock operations are not sufficient for implementing the ref counting system.
The following article covers this topic, but requires special CPU instructions:
http://www.ddj.com/architect/184401888
Nowadays, you can use the Boost/TR1 shared_ptr<> smart pointer to keep your reference counted references.
Works great; no fuss, no muss. The shared_ptr<> class takes care of all the locking needed on the refcount.
In VC++, you can use _InterlockedCompareExchange.
do
read the count
perform mathematical operation
interlockedcompareexchange( destination, updated count, old count)
until the interlockedcompareexchange returns the success code.
On other platforms/compilers, use the appropriate intrinsic for the LOCK CMPXCHG instruction that MS's _InterlockedCompareExchange exposes.
Strictly speaking, you'll need to wait until C++0x to be able to write thread-safe code in pure C++.
For now, you can use Posix, or create your own platform independent wrappers around compare and swap and/or interlocked increment/decrement.
Win32 InterlockedIncrementAcquire and InterlockedDecrementRelease (if you want to be safe and care about platforms with possible reordering, hence you need to issue memory barriers at the same time) or InterlockedIncrement and InterlockedDecrement (if you are sure you will stay x86), are atomic and will do the job.
That said, Boost/TR1 shared_ptr<> will handle all of this for you, therefore unless you need to implement it on your own, you will probably do the best to stick to it.
Bear in mind that the locking is very expensive, and it happens every time you hand objects around between smart pointers - even when the object is currently owned by one thread (the smart pointer library doesn't know that).
Given this, there may be a rule of thumb applicable here (I'm happy to be corrected!)
If the follow things apply to you:
You have complex data structures that would be difficult to write destructors for (or where STL-style value semantics would be inappropriate, by design) so you need smart pointers to do it for you, and
You're using multiple threads that share these objects, and
You care about performance as well as correctness
... then actual garbage collection may be a better choice. Although GC has a bad reputation for performance, it's all relative. I believe it compares very favourably with locking smart pointers. It was an important part of why the CLR team chose true GC instead of something using reference counting. See this article, in particular this stark comparison of what reference assignment means if you have counting going on:
no ref-counting:
a = b;
ref counting:
if (a != null)
if (InterlockedDecrement(ref a.m_ref) == 0)
a.FinalRelease();
if (b != null)
InterlockedIncrement(ref b.m_ref);
a = b;
If the instruction itself is not atomic then you need to make the section of code that updates the appropriate variable a critical section.
i.e. You need to prevent other threads entering that section of code by using some locking scheme. Of course the locks need to be atomic, but you can find an atomic locking mechanism within the pthread_mutex class.
The question of efficient: The pthread library is as efficient as it can be and still guarantee that mutex lock is atomic for your OS.
Is it expensive: Probably. But for everything that requires a guarantee there is a cost.
That particular code posted in that ddj article is adding extra complexity to account for bugs in using smart pointers.
Specifically, if you can't guarantee that the smart pointer won't change in an assignment to another smart pointer, you are doing it wrong or are doing something very unreliable to begin with. If the smart pointer can change while being assigned to another smart pointer, that means that the code doing the assignment doesn't own the smart pointer, which is suspect to begin with.