How can runtime systems support "GC" on compiled binaries?

How can runtime systems support "GC" on compiled binaries? - c++

So basically I only know some basic concept of GC:(
I am new in functional programming language, and when I am studying the Haskell's runtime system, RTS, I found that RTS support GC for the compiled binary from Haskell.
So I am confused with this part, is there a separate process created by RTS that can do GC stuff towards Haskell binaries?
What's more, if so, is there any GC implementation for C/C++ ? (Suppose programmers only use kinda of "smart pointer" to use memory, and they don't need to care about memory management, GC process will take care of it)...? As far as I know, it seems that .net can work in this way... am I right?

is there a separate process created by RTS that can do GC stuff towards Haskell binaries?
Garbage collection does not, in general, require a separate process (e.g., a "background" process), although some virtual-machine garbage collectors might do that (I don't know).
Think of the garbage collector as being merged with the heap allocator. Whenever you ask for some memory to be allocated, you invoke the heap to do so (which may hang for some time before finding a chunk to allocate). With a garbage collector, there is simply an additional step where the garbage collector first checks if it should do some collecting before allocating some memory.
What is central to garbage collection is the instrumentation (or annotation) of the memory to be able to infer which chunks of memory are still referred to / reachable. This kind of instrumentation simply implies additional operations during certain key points, like allocation, deallocation (if there is an explicit mechanism for it), and setting / getting pointer values (which are usually "hidden" under-the-hood in a garbage collected language, as opposed to languages like C/C++ that have "raw" pointers).
What's more, if so, is there any GC implementation for C/C++ ?
Yes, there are all sorts of GC implementations for C/C++. They are not very popular (AFAIK), probably because C is low-level enough that things are usually manageable, and C++ has RAII (and smart pointers) which almost-completely eliminates the need for GC.
But if you really want GC for C/C++, you can certainly get it. There are both non-deterministic GC libraries (similar to JVM or .NET) and deterministic GC libraries (similar to PHP or Python).
As far as I know, it seems that .net can work in this way... am I right?
Yeah, .NET like Java, is garbage collected. From a GC user's perspective, .NET is about the same as the JVM, as far as I know.

So I am confused with this part, is there a separate process created by RTS that can do GC stuff towards Haskell binaries?
No. Why would there be? You don't need a separate process to do GC. The Java JVM doesn't need a separate process. You seem to have some confusion somewhere. All you need is some code in the runtime library.
What's more, if so, is there any GC implementation for C/C++ ?
Yes.
As far as I know, it seems that .net can work in this way... am I right?
If you mean that .NET has GC you're correct. If you mean that it's done via smart pointers or a separate process you're wrong.

Related

Are there stackless or heapless implementation of C++?

C++ standard does not mention anything about the stack or the heap, they are implementation specific, which is true.
Even though they are not part of the C++ standard, we end up using them anyway, so much that it's like they are part of the language itself and have to be taken into consideration for memory or performance purpose.
Hence my question are there implementations of C++ that doesn't use stacks and heaps?

Others have already given good answers about the heap, so I'll leave that alone.
Some implementations (e.g., on IBM mainframes) don't use a stack as most people would think of it, for the simple reason that the hardware doesn't support it. Instead, when you call a function, an activation record (i.e., space for the locals, arguments, and return address) is allocated from (their version of) the heap. These activation records are built into a linked list.
From a purely abstract viewpoint, this is certainly a stack -- it supports last-in, first-out semantics, just like any other stack. You do have to look at it pretty abstractly to call it a stack though. If you showed people a diagram of the memory blocks linked together, I think it's safe to guess most programmers would describe it as a linked list. If you pushed them, I think most would judge it something like "yeah, you can use it in a stack-like manner, but it's still a linked list."

C++ standard does not mention anything about the stack or the heap
It actually does -- just not in those words, and without specifying how the stacks & heaps are implemented.
In C++03 there are three kinds of variables:
Those with static storage duration (3.7.1). These are "in-scope" for the duration of the program.
Those with automatic storage duration (3.7.2). These are in scope only in the context in which they are declared. Once they fall out of scope, the variable is destroyed & deallocated.
Those with dynamic storage duration (3.7.3). These are created with a new expression, and are destroyed with a delete. the objects themselves are scopeless, in the sense that their lifetime is not bound to the context in which they were newed. Immediate Pointers to these object are, of course, scoped. The pointers are of automatic or, rarely (and usually wrongly) static storage duration.
"Stack" and "Heap" are really just where later second two types of objects live. They are platform-dependant implementation details which realize language requirements.
So, technically you're right. The Standard doesn't say anything about heaps & stacks. But it does say quite a bit about different flavors of storage duration which requires some kind of implementation on a real platform. On most modern PC-type hardware, this is implemented as heaps & stacks. Could the different types of storage duration be implemented on a platform without using heaps or stacks? Anything is possible -- I suppose that it could. But whatever that implementation ended up being, it would probably have characteristics similar to at least one of the two.
In addition to all this, there is the consideration that both automatic and dynamic storage duration are required by the Standard. Any language implementation that didn't meet both of these requirements wouldn't be C++. It might be close, but it wouldn't really be C++.

For small programming environments, for example the arduino platform which was based on an 8K Atmel microprocessor (now it has 32K or more), a heap is not implemented and there is no new operator defined by the library. All objects are created statically or on the stack. You lose the advantages of the standard library, but gain being able to use an object-oriented language to program a very small platform - for example,creating classes to represent pins configured as particular output modes or serial ports, create an object of that class giving it the pin number and then calling functions on that object rather than having to pass the pin number around to your routines.
If you use new on an arduino, your program compiles but does not link - the compiler is g++ targeting the avr instruction set, so is a true C++ compiler. If you chose to provide your own implementation, you could do so, but the cost of providing an implementation on so small a footprint is not worth the gain in most cases.

Yes there are, mainly MCUs like Freescale and PIC
2. Stackless processors are being used now.
We don't look at cores the way assembly programmers do. We're happy
with bare-bones programmer's models, so long as the required
fundamentals are present. A stack is not one of them: we've run into
several stackless cores recently, and have had no problems developing
C compilers for them.
The eTPU is a 24-bit enhanced time processing unit, used in automotive
and general aviation engine control, and process control. eTPU may be
a co-processor, but it has a complete instruction set and a full CPU
core, and it's stackless. It is a mid-volume processor: chances are
you'll drive or fly home tonight courtesy of C on a stackless
processor.
We have a C compiler for the eTPU based on C99 and ISO/IEC 18037.
We run standard C test suites on this processor.
The Freescale RS08 is a more traditional MCU that is stackless. In the
process of "reducing" the HC08/HCS08 core, Freescale removed the CPU
stack. We consulted on the architecture of RS08, and we never felt the
need to insist on a hardware stack.
To mention another co-processor we consulted on, the Freescale XGATE
has a very friendly ISA and programmer's model, but it doesn't have a
stack.
Then there are the "nearly stackless". Microchip PIC never had data
stacking, with only an 8-entry (or 16-entry in the enhanced 14-bit
core) call-return stack. Nobody doubts that C is available for PICs.
These parts, especially the eTPU, were designed to be compiler
friendly and encourage machine-generated code.
There are other non-stack-based processors, some created recently,
spanning a range of applications and enjoying their own C
compilers. There are mid- to high-volume stackless parts. Performance
is usually the primary reason that parts do not have a stack.
http://www.bytecraft.com/Stack_controversy

I dare to say there is no such C++ implementation, but simply because the stack and heap are very useful abstractions for which basically all processors on the market provide some HW support to make them very efficient.
Since C++ aims at efficiency, C++ implementations will make use of them. Additionally, C++ program don't typically operate in a vacuum. They must integrate into the platform ecosystem, which is defined by the platform's Application Binary Interface. The ABI - for the very same reasons - defines stack and other memory structures the C++ implementation will need to obey to.
However, let's assume your C++ program is targeted to a simple, small, resource constrained embedded platform with an exotic microcontroller and no Operating System (your application will be the OS!) and no threading or processes.
To start with, the platform may not provide dynamic memory at all. You will need to work with a pool of static memory defined at link time, and develop your own memory allocation manager (new). C++ allows it, and in some environments it is indeed used.
Additionally, the CPU may be such that stack abstraction is not that useful, and therefore not worth implementing. For instance, CPUs like SPARC define a sliding register window mechanism that - combined with a large amount of registers - makes use of the stack not efficient for function calls (if you look at it, the stack is already done in HW!).
Long story short, all C++ implementation use stack, most use the heap, but the reason is strongly correlated to the platform properties.

This is essentially echo'ing Mr. TA's answer (+1 BTW). Stack and heap are abstract concepts.
The new and delete operators (and malloc and free functions) are really just an interface to the abstraction called a heap. So, when you ask for a C++ implementation to be "heapless", you are really asking for the implementation to not allow you to use these interfaces. I don't think there is anything preventing an implementation from always failing these interfaces.
Calling functions and resuming the current execution after the call returns (and optionally retrieving a return value) are interfaces to a stack abstraction. When you are asking for the C++ implementation to be "stackless", you are asking for the C++ implementation to disallow the program from performing these actions. I can't think of a conforming way for the compiler to impose this condition. The language dictates the source code be allowed to define functions, and to define code to call functions.
So my answer in terms of what is possible: "stackless" no, "heapless" yes.

There can't be a stack-less and heap-less implementation since C++ defines constructs such as functions and the new operator. Calling a function requires a stack, and "new"ing up an instance requires a heap. How those are implemented can be different between platforms, but the idea will be the same. There will always need to be a memory area for instance objects, and another memory area to track the execution point and call hierarchy.
Since x86 (and x64) have convenient facilities for these things (ie. the ESP register), the compiler uses that. Other platforms might be different, but the end result is logically equivalent.

Is it possible to introduce Automatic Reference Counting (ARC) to C++?

Objective C has introduced a technology called ARC to free the developer from the burden of memory management. It sounds great, I think C++ developers would be very happy if g++ also has this feature.
ARC allows you to put the burden of memory management on the (Apple LLVM 3.0) compiler, and never think about retain, release and autorelease ever again
So, if LLVM3.0 can do that, I think g++ also can free C++ developers from the tough jobs of memory management, right?
Is there any difficulties to introduce ARC to C++?
What I mean is: If we don't use smart pointers, we just use new/new[], is it possible for a compiler to do something for us to prevent memory leaks? For example, change the new to a smart pointer automatically?

C++ has the concept of Resource Allocation is Initialization(RAII) & intelligent use of this method saves you from explicit resource management.
C++ already provides shared_ptr which provides reference counting.
Also, there are a host of other Smart pointers which employ RAII to make your life easier in C++.

This is a good question. ARC is much more than just an implementation of smart pointers. It is also different from garbage collection, in that it does give you full control over memory management.
In ARC you know exactly when objects are going to be released. The reason people think is isn't true, is that there's no explicit "release" call that you write. But you know when the compiler will insert one. And it's not in some garbage collection step, it's inline when objects are considered no longer needed.
It contains a compiler step that analyzes the code and tries to find any redundant sequences of incrementing and decrementing reference counts. This could probably be achieved by an optimizing C++ compiler if it was given a smart pointer implementation that its optimizer could see through.
ARC also relies on the semantics of objective c. Firstly, pointers are annotated to say whether they are strong or weak. This could also be done in C++, just by having two different pointer classes (or using smart and vanilla pointers). Secondly, it relies on naming conventions for objective c methods to know whether their return values should be implicitly weak or strong, which means it can work alongside non-ARC code (ARC needs to know if your non-ARC code intended to return an object with a +1 reference count, for example). If your "C ARC" didn't sit alongside non-"C ARC" code you wouldn't need this.
The last thing that ARC gives you, is really good analysis of your code to say where it thinks leaks may be, at compile time. This would be difficult to add to C++ code, but could be added into the C++ compiler.

There's no need. We have shared pointers that do this for us. In fact, we have a range of pointer types for a variety of different circumstances, but shared pointers mimic exactly what ARC is doing.
See:
std::shared_ptr<>
boost::shared_ptr<>

Recently I wrote some Objective-C++ code using Clang and was surprised to find that Objective-C pointers were actually handled as non-POD types in C++ that I could use in my C++ classes without issues.
They were actually freed automatically in my destructors!
I used this to store weak references in std::vectors because I couldn't think of a way to hold an NSArrary of weak references..
Anyways, it seems to me like Clang implements ARC in Objective-C by emulating C++ RAII and smart pointers in Objective-C. When you think about it, every NSObject* in ARC is just a smart pointer (intrusive_ptr from Boost) in C++.
The only difference I can see between ARC and smart pointers is that ARC is built into the language. They have the same semantics besides that.

One of the reasons C++ is used at all is full control over memory management. If you don't want that in a particular situation there are smart pointers to do the managing for you.
Managed memory solutions exist, but in the situation C++ is chosen rightfully (for large-scale big applications), it is not a viable option.

What's the advantage of using ARC rather than full garbage collection? There was a concrete proposal for garbage collection before the committee; in the end, it wasn't handled because of lack of time, but there seems to be a majority of the committee (if not truly a consensus) in favor of adding garbage collection to C++.
Globally, reference counting is a poor substitute for true garbage collection: it's expensive in terms of run time, and it needs special code to handle cycles. It's applicable in specific limited cases, however, and C++ offers it via std::shared_ptr, at the request of the programmer, when he knows it's applicable.

Take a look of Qt. Qt has implemented this feature by leverage the hierarchy chain. You can new a pointer and assign a parent to it, Qt will help you to manage the memory.

There are already some implementations of similar technologies for C++; e.g., Boehm-Demers-Weiser garbage collector.
C++11 has a special Application Binary Interface for anyone wishing to add her own garbage collection.
In the vast majority of cases, techniques like smart pointers can do the job of painless memory management for C++ developers.

Microsoft C++/CX has ARC for ref classes. Embarcadero has 2 C++ compilers, one of them has ARC.

Are there any downsides in using C++ for network daemons?

I've been writing a number of network daemons in different languages over the past years, and now I'm about to start a new project which requires a new custom implementation of a properitary network protocol.
The said protocol is pretty simple - some basic JSON formatted messages which are transmitted in some basic frame wrapping to have clients know that a message arrived completely and is ready to be parsed.
The daemon will need to handle a number of connections (about 200 at the same time) and do some management of them and pass messages along, like in a chat room.
In the past I've been using mostly C++ to write my daemons. Often with the Qt4 framework (the network parts, not the GUI parts!), because that's what I also used for the rest of the projects and it was simple to do and very portable. This usually worked just fine, and I didn't have much trouble.
Being a Linux administrator for a good while now, I noticed that most of the network daemons in the wild are written in plain C (of course some are written in other languages, too, but I get the feeling that > 80% of the daemons are written in plain C).
Now I wonder why that is.
Is this due to a pure historic UNIX background (like KISS) or for plain portability or reduction of bloat? What are the reasons to not use C++ or any "higher level" languages for things like daemons?
Thanks in advance!
Update 1:
For me using C++ usually is more convenient because of the fact that I have objects which have getter and setter methods and such. Plain C's "context" objects can be a real pain at some point - especially when you are used to object oriented programming.
Yes, I'm aware that C++ is a superset of C, and that C code is basically C++ you can compile any C code with a C++ compiler. But that's not the point. ;)
Update 2:
I'm aware that nowadays it might make more sense to use a high level (scripting) language like Python, node.js or similar. I did that in the past, and I know of the benefits of doing that (at least I hope I do ;) - but this question is just about C and C++.

I for one can't think of any technical reason to chose C over C++. Not one that I can't instantly think of a counterpoint for anyway.
Edit in reply to edit: I would seriously discourage you from considering, "...C code is basically C++." Although you can technically compile any C program with a C++ compiler (in as far as you don't use any feature in C that's newer than what C++ has adopted) I really try to discourage anyone from writing C like code in C++ or considering C++ as "C with objects."
In response to C being standard in Linux, only in as far as C developers keep saying it :p C++ is as much part of any standard in Linux as C is and there's a huge variety of C++ programs made on Linux. If you're writing a Linux driver, you need to be doing it in C. Beyond that...I know RMS likes to say you're more likely to find a C compiler than a C++ one but that hasn't actually been true for quite a long time now. You'll find both or neither on almost all installations.
In response to maintainability - I of course disagree.
Like I said, I can't think of one that can't instantly be refuted. Visa-versa too really.

The resistance to C++ for the development for daemon code stem from a few sources:
C++ has a reputation for being hard to avoid memory leaks. And memory leaks are a no no in any long running software. This is to a degree untrue - the problem is developers with a C background tend to use C idioms in C++, and that is very leaky. Using the available C++ features like vectors and smart pointers can produce leak free code.
As a converse, the smart pointer template classes, while they hide resource allocation and deallocation from the programmer, do a lot of it under the covers. In fact C++ generally has a lot of implicit allocation as a result of copy constructors and so on. As a result the C++ heap can become fragmented over time and daemon processes will eventually fail with an out of memory error even though there is sufficient RAM. This can be ameliorated by the use of modern heap managers that are more fragmenttation resistant, but they do this by consuming more resource up front.
while this doesn't apply to usermode daemon code, kernel mode developers avoid C++, again because of the implicit code C++ generates, and the exceptions C++ libraries use to handle errors. Most c++ compilers implement c++ exceptions in terms of hardware exceptions, and lots of kernel mode code is executed in environments where exceptions are not allowed to be thrown. Also, all the implicit code generated by c++, being implicit, cannot be wrapped in #pragma directives to guarantee its placement in pageable, or non pageable memory.
As a result, C++ is not possible for kernel development on any platform at all, and generally shunned by daemon developers too. Even if one's code is written using the proper smart memory management classes and does not leak - keeping on top of potential memory fragmentation issues makes languages where memory allocation is explicit a preferred choice.

I would recommend whichever you feel more comfortable with. If you are more comfortable with C++, your code is going to be cleaner, and run more efficiently, as you'll be more used to it, if you know what I mean.
The same applies on a larger scale to something like a Python vs Perl discussion. Whichever you are more comfortable with will probably produce better code, because you'll have experience.

I think the reason is that ANSI C is the standard programming language in Linux. It is important to follow this standard whenever people want to share their code with others etc. But it is not a requirement if you just want to write something for yourself.
You personally can use C or C++ and the result will be identical. I think you should choose C++ if you know it well and can exploit some special object oriented features of it in your code. Don't look too much to other people here, if you are good in C++ just go and write your daemon in C++. I would personally write it in C++ as well.

You're right. The reason for not using C++ is KISS, particularly if you ever intend for someone else to maintain your code down the road. Most folks that I know of learned to write daemons from existing source or reading books by Stevens. Pretty much that means your examples will be in C. C++ is just fine, I've written daemons in it myself, but I think if you expect it to be maintained and you don't know who the maintainer might be down the road it shows better foresight to write in C.

Boost makes it incredibly easy to write single threaded, or multi-threaded and highly scalable, networking daemons with the asio library.

I would recommend using C++, with a reservation on using exception handling and dynamic RTTI. These features may have run time performance cost implications and may not be supported well across platforms.
C++ is more modular and maintainable so if you can avoid these features go ahead and use it for your project.

Both C and C++ are perfectly suited for the task of writing daemons.
Besides that, nowadays, you should consider also scripting languages as Perl or Python. Performance is usually just good enough and you will be able to write applications more robust and in less time.
BTW, take a look at ACE, a framework for writting portable network applications in C++.

Using Mono SGen Garbage collector in C/C++ programs

Is it possible to use SGen garbage collector (from the mono runtime) in coventionnal C/C++ programs ?
I think mono used also the Boehm-Demers-Weiser conservative garbage collector that can be used in C/C++ programs.

There are very few dependencies on the rest of the Mono code in SGen, so it should be easy to extract it out and adapt to other uses.
The major difference from the Boehm collector is that it currently doesn't support a non-precise mode for heap objects, so you can't use it to replace malloc easily.
It would work great, though, for managing objects for which you could provide precise reference information.

Not sure about the garbage collector that you have specified. But do you really need to use a GC on a C++ project? I never felt the use of a GC in my C++ projects. You should be good if you follow the best practices and use a decent smart pointer.

Porting Symbian C++ to Android NDK

I've been given some Symbian C++ code to port over for use with the Android NDK.
The code has lots of Symbian specific code in it and I have very little experience of C++ so its not going very well.
The main thing that is slowing me down is trying to figure out the alternatives to use in normal C++ for the Symbian specific code.
At the minute the compiler is throwing out all sorts of errors for unrecognised types.
From my recent research these are the types that I believe are Symbian specific:
TInt, TBool, TDesc8, RSocket, TInetAddress, TBuf, HBufc,
RPointerArray
Changing TInt and TBool to int and bool respectively works in the compiler but I am unsure what to use for the other types?
Can anyone help me out with them? Especially TDesc, TBuf, HBuf and RPointerArray.
Also Symbian has a two phase contructor using
NewL
and
NewLc
But would changing this to a normal C++ constructor be ok?
Finally Symbian uses the clean up stack to help eliminate memory leaks I believe, would removing the clean up stack code be acceptable, I presume it should be replaced with try/catch statements?

I'm not sure whether you're still interested, but one possibility is that where the Symbian idioms are using the EUSER.DLL (i.e. TDesC derived classes, RPointer*, etc) you may find taking the open source EPL code from Symbian developer site and adding it directly into your port a viable option. That is port over the necessary bits of EUSER (and others perhaps?).
However, if your current code base already uses a lot of the other subsystems you're going to see this become very unwieldy.

You should try to read some introductory text on development for Symbian. They used to have some examples in the Symbian site, and I am sure that you can find specific docs on how the types you want are meant to be used and what they provide.
The problem is that symbian development has its own idioms that cannot/should not be directly used outside of the symbian environment, as for example the two phase construction with the cleanup stack is unneeded in environments where the compiler has proper exception handling mechanisms --in Symbian a constructor that throws can lead to all sorts of mayhem.

If this is not a very large codebase it may be easier/faster to start from scratch and doing everything Android style. Even if you require NDK/C++ this approach may be quicker.
Another approach may be to use portable C/C++ for the core, and the use this on both Symbian and Android version while doing UI stuff separately for each platform. Spotify have done this on Android and iPhone.

It would typically be a bad idea to try and port Symbian OS C++ to standard C++ without having a very good understanding of what the Symbian idioms do.
This could very well be one of these projects where the right thing to do is to rewrite most of the code pretty much from scratch. If you barely know the language you are targetting, there is little point in deluding yourself into thinking you won't make mistakes, waste time and throw away new code anyway. It's all part of learning.
The CleanupStack mechanism is meant to help you deal with anything that could go wrong, including power outage and out of memory conditions. Technically, these days, it is implemented as C++ exceptions but it covers more than the usual error cases standard C++ code normally handles.
Descriptors (TDesc, TBuf and HBuf all belong to the descriptor class hierarchy) and templates (arrays, queues, lists...) predate their equivalent in standard C++ while dealing with issues like the CleanupStack, coding standards, memory management and integrity...
A relevant plug if you want to learn about it: Quick Recipes On Symbian OS is a recent attempt at explaning it all in as few pages as possible.
You should also definitely look at the Foundation website to get started.
Classes prefixed by T are meant to be small enough by themselves that they can be allocated on the stack.
Descriptor classes suffixed by C are meant to be immutable (A mutable descriptor can usually be created from them, though).
HBufC is pretty much the only Symbian class prefixed by H. It should always be allocated on the Heap.
A method suffixed by C will add an object on the CleanupStack when it returns successfully (usually, it's the object it returns). It's up to the calling code to Pop that object.
Classes prefixed by R are meant to be allocated on the stack but manage their own heap-based resources. They usually have some kind of Close() method that needs to be called before their destructor.
A typical way to thing about the differences between a collection of objects and a collection of pointers to object is who owns the objects in the collection. Either the collection owns the objects when they are added and looses them when they are removed (and is therefore responsible for deleting each object it still contains when it is itself destroyed) or the collection doesn't transfer ownership and something else must ensure the objects it contains will stay valid during the collection's lifetime.
Another way to think about collections is about how much copying of objects you want to happen when you add/get objects to/from the collection.
Symbian descriptor and collection classes are meant to cover all these different ways of using memory and let you choose the one you need based on what you want to do.
It's certainly not easy to do it right but that's how this operating system works.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js