Converting C source to C++ - c++

How would you go about converting a reasonably large (>300K), fairly mature C codebase to C++?
The kind of C I have in mind is split into files roughly corresponding to modules (i.e. less granular than a typical OO class-based decomposition), using internal linkage in lieu private functions and data, and external linkage for public functions and data. Global variables are used extensively for communication between the modules. There is a very extensive integration test suite available, but no unit (i.e. module) level tests.
I have in mind a general strategy:
Compile everything in C++'s C subset and get that working.
Convert modules into huge classes, so that all the cross-references are scoped by a class name, but leaving all functions and data as static members, and get that working.
Convert huge classes into instances with appropriate constructors and initialized cross-references; replace static member accesses with indirect accesses as appropriate; and get that working.
Now, approach the project as an ill-factored OO application, and write unit tests where dependencies are tractable, and decompose into separate classes where they are not; the goal here would be to move from one working program to another at each transformation.
Obviously, this would be quite a bit of work. Are there any case studies / war stories out there on this kind of translation? Alternative strategies? Other useful advice?
Note 1: the program is a compiler, and probably millions of other programs rely on its behaviour not changing, so wholesale rewriting is pretty much not an option.
Note 2: the source is nearly 20 years old, and has perhaps 30% code churn (lines modified + added / previous total lines) per year. It is heavily maintained and extended, in other words. Thus, one of the goals would be to increase mantainability.
[For the sake of the question, assume that translation into C++ is mandatory, and that leaving it in C is not an option. The point of adding this condition is to weed out the "leave it in C" answers.]

Having just started on pretty much the same thing a few months ago (on a ten-year-old commercial project, originally written with the "C++ is nothing but C with smart structs" philosophy), I would suggest using the same strategy you'd use to eat an elephant: take it one bite at a time. :-)
As much as possible, split it up into stages that can be done with minimal effects on other parts. Building a facade system, as Federico Ramponi suggested, is a good start -- once everything has a C++ facade and is communicating through it, you can change the internals of the modules with fair certainty that they can't affect anything outside them.
We already had a partial C++ interface system in place (due to previous smaller refactoring efforts), so this approach wasn't difficult in our case. Once we had everything communicating as C++ objects (which took a few weeks, working on a completely separate source-code branch and integrating all changes to the main branch as they were approved), it was very seldom that we couldn't compile a totally working version before we left for the day.
The change-over isn't complete yet -- we've paused twice for interim releases (we aim for a point-release every few weeks), but it's well on the way, and no customer has complained about any problems. Our QA people have only found one problem that I recall, too. :-)

What about:
Compiling everything in C++'s C subset and get that working, and
Implementing a set of facades leaving the C code unaltered?
Why is "translation into C++ mandatory"? You can wrap the C code without the pain of converting it into huge classes and so on.

Your application has lots of folks working on it, and a need to not-be-broken.
If you are serious about large scale conversion to an OO style, what
you need is massive transformation tools to automate the work.
The basic idea is to designate groups of data as classes, and then
get the tool to refactor the code to move that data into classes,
move functions on just that data into those classes,
and revise all accesses to that data to calls on the classes.
You can do an automated preanalysis to form statistic clusters to get some ideas,
but you'll still need an applicaiton aware engineer to decide what
data elements should be grouped.
A tool that is capable of doing this task is our DMS Software Reengineering
Toolkit.
DMS has strong C parsers for reading your code, captures the C code
as compiler abstract syntax trees, (and unlike a conventional compiler)
can compute flow analyses across your entire 300K SLOC.
DMS has a C++ front end that can be used as the "back" end;
one writes transformations that map C syntax to C++ syntax.
A major C++ reengineering task on a large avionics system gives
some idea of what using DMS for this kind of activity is like.
See technical papers at
www.semdesigns.com/Products/DMS/DMSToolkit.html,
specifically
Re-engineering C++ Component Models Via Automatic Program Transformation
This process is not for the faint of heart. But than anybody
that would consider manual refactoring of a large application
is already not afraid of hard work.
Yes, I'm associated with the company, being its chief architect.

I would write C++ classes over the C interface. Not touching the C code will decrease the chance of messing up and quicken the process significantly.
Once you have your C++ interface up; then it is a trivial task of copy+pasting the code into your classes. As you mentioned - during this step it is vital to do unit testing.

GCC is currently in midtransition to C++ from C. They started by moving everything into the common subset of C and C++, obviously. As they did so, they added warnings to GCC for everything they found, found under -Wc++-compat. That should get you on the first part of your journey.
For the latter parts, once you actually have everything compiling with a C++ compiler, I would focus on replacing things that have idiomatic C++ counterparts. For example, if you're using lists, maps, sets, bitvectors, hashtables, etc, which are defined using C macros, you will likely gain a lot by moving these to C++. Likewise with OO, you'll likely find benefits where you are already using a C OO idiom (like struct inheritence), and where C++ will afford greater clarity and better type checking on your code.

Your list looks okay except I would suggest reviewing the test suite first and trying to get that as tight as possible before doing any coding.

Let's throw another stupid idea:
Compile everything in C++'s C subset and get that working.
Start with a module, convert it in a huge class, then in an instance, and build a C interface (identical to the one you started from) out of that instance. Let the remaining C code work with that C interface.
Refactor as needed, growing the OO subsystem out of C code one module at a time, and drop parts of the C interface when they become useless.

Probably two things to consider besides how you want to start are on what you want to focus, and where you want to stop.
You state that there is a large code churn, this may be a key to focus your efforts. I suggest you pick the parts of your code where a lot of maintenance is needed, the mature/stable parts are apparently working well enough, so it is better to leave them as they are, except probably for some window dressing with facades etc.
Where you want to stop depends on what the reason is for wanting to convert to C++. This can hardly be a goal in itself. If it is due to some 3rd party dependency, focus your efforts on the interface to that component.
The software I work on is a huge, old code base which has been 'converted' from C to C++ years ago now. I think it was because the GUI was converted to Qt. Even now it still mostly looks like a C program with classes. Breaking the dependencies caused by public data members, and refactoring the huge classes with procedural monster methods into smaller methods and classes never has really taken off, I think for the following reasons:
There is no need to change code that is working and that does not need to be enhanced. Doing so introduces new bugs without adding functionality, and end users don't appreciate that;
It is very, very hard to do refactor reliably. Many pieces of code are so large and also so vital that people hardly dare touching it. We have a fairly extensive suite of functional tests, but sufficient code coverage information is hard to get. As a result, it is difficult to establish whether there are already sufficient tests in place to detect problems during refactoring;
The ROI is difficult to establish. The end user will not benefit from refactoring, so it must be in reduced maintenance cost, which will increase initially because by refactoring you introduce new bugs in mature, i.e. fairly bug-free code. And the refactoring itself will be costly as well ...
NB. I suppose you know the "Working effectively with Legacy code" book?

You mention that your tool is a compiler, and that: "Actually, pattern matching, not just type matching, in the multiple dispatch would be even better".
You might want to take a look at maketea. It provides pattern matching for ASTs, as well as the AST definition from an abstract grammar, and visitors, tranformers, etc.

If you have a small or academic project (say, less than 10,000 lines), a rewrite is probably your best option. You can factor it however you want, and it won't take too much time.
If you have a real-world application, I'd suggest getting it to compile as C++ (which usually means primarily fixing up function prototypes and the like), then work on refactoring and OO wrapping. Of course, I don't subscribe to the philosophy that code needs to be OO structured in order to be acceptable C++ code. I'd do a piece-by-piece conversion, rewriting and refactoring as you need to (for functionality or for incorporating unit testing).

Here's what I would do:
Since the code is 20 years old, scrap down the parser/syntax analyzer and replace it with one of the newer lex/yacc/bison(or anything similar) etc based C++ code, much more maintainable and easier to understand. Faster to develop too if you have a BNF handy.
Once this is retrofitted to the old code, start wrapping modules into classes. Replace global/shared variables with interfaces.
Now what you have will be a compiler in C++ (not quite though).
Draw a class diagram of all the classes in your system, and see how they are communicating.
Draw another one using the same classes and see how they ought to communicate.
Refactor the code to transform the first diagram to the second. (this might be messy and tricky)
Remember to use C++ code for all new code added.
If you have some time left, try replacing data structures one by one to use the more standardized STL or Boost.

Related

How do I effectively manage a Clojure code base?

A coworker and I are Clojure newbies. We started a project a couple months back, but quickly found that we had a tough time dealing with our code base -- by 500 LOC we basically had no idea where to start with the debugging, when things went wrong (which was often). Instead of pairs, functions were getting lists, or numbers, or what-have-you.
Now we're starting a new but related project and migrating a lot of the old code over. But we're again hitting a wall.
We're wondering, how do we effectively manage a Clojure project, especially as we make changes to existing code?
What we've come up with:
liberal use of unit-tests
liberal use of pre-, post-conditions
informal type declarations in function comments
use defrecord/defstruct/defprotocol to implement a data model, which would really simplify testing
But post-, pre-conditions seem not to be used very often. Unit-testing + comments will only help so much. And it seems like Clojure programmers don't typically implement formal data models.
Do we just not get Clojure? How do Clojure programmers know that their code is robust and correct?
I think this is actually an evolving area - Clojure hasn't really been around long enough for all of the best practices and associated tools for managing a large code base to be developed yet.
Some suggestions from my experience:
Structure your code in a "bottom up" way - in general, the way you want to structure you code will have the "utility" code at the top of the file (or imported from another namespace) and the "business logic" code that uses these utility functions towards the end of the file. If this seems difficult to do, then it's probably a hint that your code needs some refactoring.
Tests as examples - Test code in clojure works very well both to sanity check your code but also as documentation (e.g. "what kind of parameter is this function expecting?"). If you hit a bug, refer to your tests to check your assumptions and write a couple of new tests to flush out what is going wrong.
Keep functions simple and compose them - Kind of an extension of the "single responsibility principle" to functional programming. I consider more than 5-10 lines in a Clojure function as a major code smell (if this seems extreme, just remember that you can probably achieve as much in 5-10 lines of Clojure as you could with 50-100 lines of Java/C#)
Watch out for "imperative habits" - when I first started using Clojure, I wrote a lot of pseudo-imperative code in Clojure. An example would be emulating a for loop with "dotimes" and accumulating some result within an atom. This can be painful - it's not idiomatic, it's confusing and usually there is a much smarter, simpler and less error-prone functional way of doing it. This takes practice, but it is worth it in the long run...
Debug at the REPL - usually when I hit an issue, coding at the REPL is the easiest way to flush it out. Generally this means running some specific parts of the larger algorithm to check assumptions etc.
Refactor common utility functions out - you'll probably find a bunch of common or structure repeated in many functions. Well worth pulling this out into a function or macro that you can re-use in other places or projects - that way you can test it much more rigorously and have the benefits in multiple places. Bonus points if you can get it all the way upstream into Clojure itself! If you do this well enough, then your main code base will be extremely succinct and therefore easy to manage, containing nothing but the genuinely domain-specific code.
simple composable abstractions
"It is better to have 100 functions operate on one data structure than to have 10 functions operate on 10 data structures." - Alan J. Perlis
For me its all about composing simple functions. Try to break every function down into the smallest units you can and then have another function that composes them to do the work your need. You know you are in good shape is every function can be tested independently. If you go too heavy on the macroes then it can make this step harder because macroes compose differently.
D.R.Y, Seriously, just don't repeat yourself
starting with well decomposed functions in a a bunch of namespaces; every time I need one of the composable parts somewhere else I "hoist" that function up to a library included by both namespaces. This way your commonly used abstractions sort of evolve over the course of the project into "just enough framework". It is very difficult to do this unless you really have discrete composable abstractions.
Sorry to dig up this old question, the answers by mikera and Arthur are excellent, but it's something I've also wondered about as I've been learning Clojure, and thought I'd mention how we organise files.
In a similar vein to ensuring each function has a single job, we group related functions into namespaces to make it easier to navigate the code. So we might have a namespace for functions providing access to a particular database, or providing a collection of HTTP-related utilities. This keeps each file relatively small, and makes tests easier to find. It also makes refactoring much more straightforward. This is hardly anything new, but it's worth bearing in mind.

How to update old C code? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Closed 4 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I have been working on some 10 year old C code at my job this week, and after implementing a few changes, I went to the boss and asked if he needed anything else done. That's when he dropped the bomb. My next task was to go through the 7000 or so lines and understand more of the code, and to modularize the code somewhat. I asked him how he would like the source code modularized, and he said to start putting the old C code into C++ classes.
Being a good worker, I nodded my head yes, and went back to my desk, where I sit now, wondering how in the world to take this code, and "modularize" it. It's already in 20 source files, each with its own purpose and function. In addition, there are three "main" structs. each of these structures has 30 plus fields, many of them being other, smaller structs. It's a complete mess to try to understand, but almost every single function in the program is passed a pointer to one of the structs and uses the struct heavily.
Is there any clean way for me to shoehorn this into classes? I am resolved to do it if it can be done, I just have no idea how to begin.
First, you are fortunate to have a boss who recognizes that code refactoring can be a long-term cost-saving strategy.
I've done this many times, that is, converting old C code to C++. The benefits may surprise you. The final code may be half the original size when you're done, and much simpler to read. Plus, you will likely uncover tricky C bugs along the way. Here are the steps I would take in your case. Small steps are important because you can't jump from A to Z when refactoring a large body of code. You have to go through small, intermediate steps which may never be deployed, but which can be validated and tagged in whatever RCS you are using.
Create a regression/test suite. You will run the test suite each time you complete a batch of changes to the code. You should have this already, and it will be useful for more than just this refactoring task. Take the time to make it comprehensive. The exercise of creating the test suite will get you familiar with the code.
Branch the project in your revision control system of choice. Armed with a test suite and playground branch, you will be empowered to make large modifications to the code. You won't be afraid to break some eggs.
Make those struct fields private. This step requires very few code changes, but can have a big payoff. Proceed one field at a time. Try to make each field private (yes, or protected), then isolate the code which access that field. The simplest, most non-intrusive conversion would be to make that code a friend function. Consider also making that code a method. Converting the code to be a method is simple, but you will have to convert all of the call sites as well. One is not necessarily better than the other.
Narrow the parameters to each function. It's unlikely that any function requires access to all 30 fields of the struct passed as its argument. Instead of passing the entire struct, pass only the components needed. If a function does in fact seem to require access to many different fields of the struct, then this may be a good candidate to be converted to an instance method.
Const-ify as many variables, parameters, and methods as possible. A lot of old C code fails to use const liberally. Sweeping through from the bottom up (bottom of the call graph, that is), you will add stronger guarantees to the code, and you will be able to identify the mutators from the non-mutators.
Replace pointers with references where sensible. The purpose of this step has nothing to do with being more C++-like just for the sake of being more C++-like. The purpose is to identify parameters that are never NULL and which can never be re-assigned. Think of a reference as a compile-time assertion which says, this is an alias to a valid object and represents the same object throughout the current scope.
Replace char* with std::string. This step should be obvious. You might dramatically reduce the lines of code. Plus, it's fun to replace 10 lines of code with a single line. Sometimes you can eliminate entire functions whose purpose was to perform C string operations that are standard in C++.
Convert C arrays to std::vector or std::array. Again, this step should be obvious. This conversion is much simpler than the conversion from char to std::string because the interfaces of std::vector and std::array are designed to match the C array syntax. One of the benefits is that you can eliminate that extra length variable passed to every function alongside the array.
Convert malloc/free to new/delete. The main purpose of this step is to prepare for future refactoring. Merely changing C code from malloc to new doesn't directly gain you much. This conversion allows you to add constructors and destructors to those structs, and to use built-in C++ automatic memory tools.
Replace localize new/delete operations with the std::auto_ptr family. The purpose of this step is to make your code exception-safe.
Throw exceptions wherever return codes are handled by bubbling them up. If the C code handles errors by checking for special error codes then returning the error code to its caller, and so on, bubbling the error code up the call chain, then that C code is probably a candidate for using exceptions instead. This conversion is actually trivial. Simply throw the return code (C++ allows you to throw any type you want) at the lowest level. Insert a try{} catch(){} statement at the place in the code which handles the error. If no suitable place exists to handle the error, consider wrapping the body of main() in a try{} catch(){} statement and logging it.
Now step back and look how much you've improved the code, without converting anything to classes. (Yes, yes, technically, your structs are classes already.) But you haven't scratched the surface of OO, yet managed to greatly simplify and solidify the original C code.
Should you convert the code to use classes, with polymorphism and an inheritence graph? I say no. The C code probably does not have an overall design which lends itself to an OO model. Notice that the goal of each step above has nothing to do with injecting OO principles into your C code. The goal was to improve the existing code by enforcing as many compile-time constraints as possible, and by eliminating or simplifying the code.
One final step.
Consider adding benchmarks so you can show them to your boss when you're done. Not just performance benchmarks. Compare lines of code, memory usage, number of functions, etc.
Really, 7000 lines of code is not very much. For such a small amount of code a complete rewrite may be in order. But how is this code going to be called? Presumably the callers expect a C API? Or is this not a library?
Anyway, rewrite or not, before you start, make sure you have a suite of tests which you can run easily, with no human intervention, on the existing code. Then with every change you make, run the tests on the new code.
This shoehorning into C++ seems to be arbitrary, ask your boss why he needs that done, figure out if you can meet the same goal less painfully, see if you can prototype a subset in the new less painful way, then go and demo to your boss and recommend that you follow the less painful way.
First, tell your boss you're not continuing until you have:
http://www.amazon.com/Refactoring-Improving-Design-Existing-Code/dp/0201485672
and to a lesser extent:
http://www.amazon.com/Working-Effectively-Legacy-Michael-Feathers/dp/0131177052
Secondly, there is no way of modularising code by shoe-horning it into C++ class. This is a huge task and you need to communicate the complexity of refactoring highly proceedural code to your boss.
It boils down to making a small change (extract method, move method to class etc...) and then testing - there is no short cuts with this.
I do feel your pain though...
I guess that the thinking here is that increasing modularity will isolate pieces of code, such that future changes are facilitated. We have confidence in changing one piece because we know it cannot affect other pieces.
I see two nightmare scenarios:
You have nicely structured C code, it will easily transform to C++ classes. In which case it probably already is pretty darn modular, and you've probably done nothing useful.
It's a rats-nest of interconnected stuff. In which case it's going to be really tough to disentangle it. Increasing modularity would be good, but it's going to be a long hard slog.
However, maybe there's a happy medium. Could there be pieces of logic that important and conceptually isolated but which are currently brittle because of a lack of data-hiding etc. (Yes good C doesn't suffer from this, but we don't have that, otherwise we would leave well alone).
Pulling out a class to own that logic and its data, encpaulating that piece could be useful. Whether it's better to do it wih C or C++ is open to question. (The cynic in me says "I'm a C programmer, great C++ a chance to learn something new!")
So: I'd treat this as an elephant to be eaten. First decide if it should be eaten at all, bad elephent is just no fun, well structured C should be left alone. Second find a suitable first bite. And I'd echo Neil's comments: if you don't have a good automated test suite, you are doomed.
I think a better approach could be totally rewrite the code, but you should ask your boss for what purpose he wants you "to start putting the old C code into c++ classes".
You should ask for more details
Surely it can be done - the question is at what cost? It is a huge task, even for 7K LOC. Your boss must understand that it's gonna take a lot of time, while you can't work on shiny new features etc. If he doesn't fully understand this, and/or is not willing to support you, there is no point starting.
As #David already suggested, the Refactoring book is a must.
From your description it sounds like a large part of the code is already "class methods", where the function gets a pointer to a struct instance and works on that instance. So it could be fairly easily converted into C++ code. Granted, this won't make the code much easier to understand or better modularized, but if this is your boss' prime desire, it can be done.
Note also, that this part of the refactoring is a fairly simple, mechanical process, so it could be done fairly safely without unit tests (with hyperaware editing of course). But for anything more you need unit tests to make sure your changes don't break anything.
It's very unlikely that anything will be gained by this exercise. Good C code is already more modular than C++ typically can be - the use of pointers to structs allows compilation units to be independent in the same was as pImpl does in C++ - in C you don't have to expose the data inside a struct to expose its interface. So if you turn each C function
// Foo.h
typedef struct Foo_s Foo;
int foo_wizz (const Foo* foo, ... );
into a C++ class with
// Foo.hxx
class Foo {
// struct Foo members copied from Foo.c
int wizz (... ) const;
};
you will have reduced the modularity of the system compared with the C code - every client of Foo now needs rebuilding if any private implementation functions or member variables are added to the Foo type.
There are many things classes in C++ do give you, but modularity is not one of them.
Ask your boss what the business goals are being achieved by this exercise.
Note on terminology:
A module in a system is a component with a well defined interface which can be replaced with another module with the same interface without effecting the rest of the system. A system composed of such modules is modular.
For both languages, the interface to a module is by convention a header file. Consider string.h and string as defining the interfaces to simple string processing modules in C and C++. If there is a bug in the implementation of string.h, a new libc.so is installed. This new module has the same interface, and anything dynamically linked to it immediately gets the benefit of the new implementation. Conversely, if there is a bug in string handling in std::string, then every project which uses it needs to be rebuilt. C++ introduces a very large amount of coupling into systems, which the language does nothing to mitigate - in fact, the better uses of C++ which fully exploit its features are often a lot more tightly coupled than the equivalent C code.
If you try and make C++ modular, you typically end up with something like COM, where every object has to have both an interface (a pure virtual base class) and an implementation, and you substitute an indirection for efficient template generated code.
If you don't care about whether your system is composed of replaceable modules, then you don't need to perform actions to to make it modular, and can use some of the features of C++ such as classes and templates which, suitable applied, can improve cohesion within a module. If your project is to produce a single, statically linked application then you don't have a modular system, and you can afford to not care at all about modularity. If you want to create something like anti-grain geometry which is beautiful example of using templates to couple together different algorithms and data structures, then you need to do that in C++ - pretty well nothing else widespread is as powerful.
So be very careful what your manager means by 'modularise'.
If every file already has "its own purpose and function" and "every single function in the program is passed a pointer to one of the structs" then the only difference made in changing it into classes would be to replace the pointer to the struct with the implicit this pointer. That would have no effect on how modularised the system is, in fact (if the struct is only defined in the C file rather than in the header) it will reduce modularity.
With “just” 7000 lines of C code, it will probably be easier to rewrite the code from scratch, without even trying to understand the current code.
And there is no automated way to do or even assist the modularization and refactoring that you envisage.
7000 LOC may sound like much but a lot of this will be boilerplate.
Try and see if you can simplify the code before changing it to c++. Basically though I think he just wants you to convert functions into class methods and convert structs into class data members (if they don't contain function pointers, if they do then convert these to actual methods). Can you get in touch with the original coder(s) of this program? They could help you get some understanding done but mainly I would be searching for that piece of code that is the "engine" of the whole thing and base the new software from there. Also, my boss told me that sometimes it is better to simply rewrite the whole thing, but the existing program is a very good reference to mimic the run time behavior of. Of course specialized algorithms are hard to recode. One thing I can assure you of is that if this code is not the best it could be then you are going to have alot of problems later on. I would go up to your boss and promote the fact that you need to redo from scratch parts of the program. I have just been there and I am really happy my supervisor gave me the ability to rewrite. Now the 2.0 version is light years ahead of the original version.
I read this article which is titled "Make bad code good" from http://www.javaworld.com/javaworld/jw-03-2001/jw-0323-badcode.html?page=7 . Its directed at Java users, but all of its ideas our pretty applicable to your case I think. Though the title makes it sound likes it is only for bad code, I think the article is for maintenance engineers in general.
To summarize Dr. Farrell's ideas, he says:
Start with the easy things.
Fix the comments
Fix the formatting
Follow project conventions
Write automated tests
Break up big files/functions
Rewrite code you don't understand
I think after following everyone else's advice this might be a good article to read when you have some free time.
Good luck!

Write C++ in a graphical scratch-like way?

I am considering the possibility of designing an application that would allow people to develop C++ code graphically. I was amazed when I discovered Scratch (see site and tutorial videos).
I believe most of C++ can be represented graphically, with the exceptions of preprocessor instructions and possibly function pointers.
What C++ features do you think could be (or not be) represented by graphical items?
What would be the pros and cons of such an application ? How much simpler would it be than "plain" C++?
RECAP and MORE:
Pros:
intuitive
simple for small applications
helps avoid typos
Cons:
may become unreadable for large (medium?) - sized applications
manual coding is faster for experienced programmers
C++ is too complicated a language for such an approach
Considering that we -at my work- already have quite a bit of existing C++ code, I am not looking for a completely new way of programming. I am considering an alternate way of programming that is fully compatible with legacy code. Some kind of "viral language" that people would use for new code and, hopefully, would eventually use to replace existing code as well (where it could be useful).
How do you feel towards this viral approach?
When it comes to manual vs graphical programming, I tend to agree with your answers. This is why, ideally, I'll find a way to let the user always choose between typing and graphical programming. A line-by-line parser (+partial interpreter) might be able to convert typed code into graphical design. It is possible. Let's all cross our fingers.
Are there caveats to providing both typing and graphical programming capabilities that I should think about and analyze carefully?
I have already worked on template classes (and more generally type-level C++) and their graphical representation.
See there for an example of graphical representation of template classes. Boxes represent classes or class templates. First top node is the class itself, the next ones (if any) are typedef instructions inside the class. Bottom nodes are template arguments. Edges, of course, connect classes to template arguments for instantiations.
I already have a prototype for working on such type-level diagrams.
If you feel this way of representing template classes is plain wrong, don't hesitate to say so and why!
Much as I like Scratch, it is still much quicker for an experienced programmer to write code using a text editor than it is to drag blocks around, This has been proved time and again with any number of graphical programming environments.
Writing code is the easiest part of a developers day. I don't think we need more help with that. Reading, understanding, maintaining, comparing, annotating, documenting, and validating is where - despite a gargantuan amount of tools and frameworks - we still are lacking.
To dissect your pros:
Intuitive and simple for small applications - replace that with "misleading". It makes it look simple, but it isn't: As long as it is simple, VB.NET is simpler. When it gets complicated, visual design would get in the way.
Help avoid typos - that's what a good style, consistency and last not least intellisense are for. The things you need anyway when things aren't simple anymore.
Wrong level
You are thinking on the wrong level: C++ statements are not reusable, robust components, they are more like a big bag of gears that need to be put together correctly. C++ with it's complexity and exceptions (to rules) isn't even particulary suited.
If you want to make things easy, you need reusable components at a much higher level. Even if you have these, plugging them together is not simple. Despite years of struggle, and many attempts in many environments, this sometimes works and often fails.
Viral - You are correct IMO about that requriement: allow incremental adoption. This is closely related to switching smoothly between source code and visual representation, which in turn probably means you must be able to generate the visual representation from modified source code.
IDE Support - here's where most language-centered approaches go astray. A modern IDE is more than just a text editor and a compiler. What about debugging your graph - with breakpoints, data inspection etc? Will profilers, leak detectors etc. highlight nodes in your graph? Will source control give me a Visual Diff of yesterday's graph vs. today's?
Maybe you are on to something, despite all my "no"s: a better way to visualize code, a way to put different filters on it so that I see just what I need to see.
The early versions of C++ were originally written so that they compiled to C, then the C was compiled as normal.
What it sounds like you are describing is a graphical language that is compiled to C++, which will then be compiled as normal.
So really you are not creating a graphical C++, you are creating a new language that happens to be graphical. Nothing wrong with that, but don't let C++ restrict what you do, because eventually you may want to compile the graphical language straight to machine code, or even to something like CIL, Java ByteCode, or whatever else tickles your fancy.
Other graphical languages you may want to check out are LabVIEW, and more generally the category of visual programming languages.
Good luck in your efforts.
The complexity of a nontrivial program is usually too high to be represented with graphical symbols, which are low in their information content. Unless your approach is markedly different in some way, I am skeptical that this would be of value based on past efforts.
So, practically speaking, his will be useful only for instructional purposes and very simple programs. But that would still be a great target market for a product like this. sometimes people have trouble grasping the fundamentals, and a visual model might be just the thing to help things click.
Interesting idea. I doubt I'd use it though. I tend to prefer coding in a flat text editor, not even an IDE, and for tough problems I prefer a pad of paper. Most of the really experienced programmers I know work this way, Maybe it's because we grew up in a different environment, but I think it's also because of the way we think about programming. As you get more experience, you start seeing the code in your head more clearly than any GUI tool will show it to you.
As for your question, I'd nominate templates as one of the harder / more interesting sort of thing to try to represent well. They are ubiquitous and carry information that you won't have access to as you are designing your tool. Getting that to the user in a useful way should pose an interesting challenge.
What C++ features do you think could be [...] represented by graphical items?
Object Oriented Design. Hence classes, inheritance, polymorphism, mutability, const-ness etc. And, templates.
What would be the pros and cons of such an application?
It may be easier for beginners to start writing programs. For the experienced, it may be get rid of the boring parts of programming.
Think of any other code generator. They create a framework for you to write the more involved portion(s). They also lead to bloated-code (think of any WYSIWYG HTML editor).
The biggest challenge, as I see it, is that any such UI necessarily hinders the user's imagination.
How much simpler would it be than "plain" c++ ?
It can be a real pain, when you wade through truckloads of errors which is typical of code generators.
Further, since a lot of code is generated, you have no idea of what is going on -- debugging becomes difficult.
Also, for the experienced there may be some irritation to find that the generated code is not per their preferred coding style.
I prefer hot-keys instead graphical menus and buttons.
And I think same thing will happen with graphical development tool. Many peoples will prefer manual codding.
But, source code visualizer - should be nice thing.
I like the idea, but I suspect there comes a point where things get far too complicated to be represented graphically.
However, given recent experience at work; it would be useful to give such a graphical interface to a non-techie person to use to create basic drag-and-drop programs, leaving myself free to get on with some "proper" programming ;-) If it can do the job of allowing somebody non-skilled to build something functional it can be a very good thing (even if programming logic escapes them)
There comes a point in such a system where it becomes easier to define what you want to do using literal C++ code, rather than have a user interface getting in the way; it can get frustrating to the sessioned programmer knowing the precise code that needs to be written but then only being limited to the design GUI. I'm specifically thinking about a more common application, such as html editors/designers in which they allow newbies to build their websites without knowing any html at all.
It would be interesting to see how such a system would handle the dynamic allocation of memory, and the different states of a program as time progressed; I suspect that there are some very basic programming concepts that may be difficult to represent graphically.. Polymorphism..? Virtual Classes, LinkList, Stacks/Circular Queues. I wonder for a moment how you would explain an image compression algorithm (such as jpg) successfully too without the help of a gigantic display screen.
I also wonder if such a system would even go to such a low level, and whether you would be dealing with abstracted concepts and the compiler would be working out the best way to do something.
I've been working on a new model-driven software development paradigm named ABSE (http://www.abse.info) that supports end-user programming: It's a template-based system that can be complemented with transformation code. I also have an IDE (named AtomWeaver) implementing ABSE that is in pre-alpha stage right now.
With AtomWeaver, as an expert/architect, you build your knowledge Templates, and then the developers (or end-users if you make your meta-models simpler) can just "assemble" systems by building blocks, and then filling template parameters in form-style editors.
At the end, pressing the "Generate" button will create the final system as specified by the architect/expert.
I'm surprised you think function pointers would be a particular problem. How about anything at all to do with pointers?
A programming language can be represented by a hierarchy of nodes - that's exactly what the compiler turns it into. It is very strange that the UI for editing programs is still a sequence of characters that get parsed, because the degrees of freedom in the editor is way larger than the available set of allowed choices. But intellisense helps to reduce this problem a lot.
C++ would be a strange choice to base such a system on.
I think the major problem of this kind of IDEs are that the code generated becomes unmantainable easily.
This happened to Delphi. It's a really nice tool to develop some kind of applications, however, when we start adding complex relationships between the components, start adding Design Patterns, etc. the code grows to an unmantainable size.
I believe it's also because graphical tools don't apply the concept of MVC (or if they do, it's only in the way that the IDE understands).
It can be really helpful for prototypes and very small applications that don't tend to grow, otherwise it can become a mess for the developer(s)

How can I make my own C++ compiler understand templates, nested classes, etc. strong features of C++?

It is a university task in my group to write a compiler of C-like language. Of course I am going to implement a small part of our beloved C++.
The exact task is absolutely stupid, and the lecturer told us it need to be self-compilable (should be able to compile itself) - so, he meant not to use libraries such as Boost and STL. He also does not want us to use templates because it is hard to implement.
The question is - is it real for me, as I`m going to write this project on my own, with the deadline at the end of May - the middle of June (this year), to implement not only templates, but also nested classes, namespaces, virtual functions tables at the level of syntax analysis?
PS I am not noobie in C++
Stick to doing a C compiler.
Believe me, it's hard enough work building a decent C compiler, especially if its expected to compile itself. Trying to support all the C++ features like nested classes and templates will drive you insane. Perhaps a group could do it, but on your own, I think a C compiler is more than enough to do.
If you are dead set on this, at least implement a C-like language first (so you have something to hand in). Then focus on showing off.
"The exact task is absolutely stupid" - I don't think you're in a position to make that judgment fairly. Better to drop that view.
"I`m going to write this project on my own" - you said it's a group project. Are you saying that your group doesn't want to go along with your view that it should morph into C++, so you're taking off and working on your own? There's another bit I'd recommend changing.
It doesn't matter how knowledgable you are about C++. Your ability with grammars, parsers, lexers, ASTs, and code generation seems far more germane.
Without knowing more about you or the assignment, I'd say that you'd be doing well to have the original assignment done by the end of May. That's three months away. Stick to the assignment. It might surprise you with its difficulty.
If you finish early, and fulfill your obligation to your team, I'd say you should feel free to modify what's produced to add C++ features.
I'll bet it took Bjarne Stroustrup more than three months to add objects to C. Don't overestimate yourself or underestimate the original assignment.
No problem. And while you're at it, why not implement an operating system for it to run on too.
Follow the assignment. Write a compiler for a C-like language!
What I'd do is select a subset of C. Remove floating-point datatypes and every other feature that isn't necessary in building your compiler.
Writing a C compiler is a lot of work. You won't be able to do that in a couple of months.
Writing a C++ compiler is downright insane. You wouldn't be able to do that in 5 years.
I will like to stress a few points already mentioned and give a few references.
1) STICK TO THE 1989 ANSI C STANDARD WITH NO OPTIMIZATION.
2) Don't worry, with proper guidance, good organization and a fair amount of hard work this is doable.
3) Read the The C Programming Language cover to cover.
4) Understand important concepts of compiler development from the Dragon Book.
5) Take a look at lcc both the code as well as the book.
6) Take a look at Lex and Yacc (or Flex and Bison)
7) Writing a C compiler (up to the point it can self compile) is a rite of passage ritual among programmers. Enjoy it.
For a class project, I think that requiring the compiler to be able to compile itself is a bit much to ask. I assume that this is what was meant by stupid in the question. It means that you need to figure out in advance exactly how much of C you are going to implement, and stick to that in building the compiler. So, building a symbol table using primitives rather than just using an STL map. This might be useful for a data structure course, but misses the point for a compiler course. It should be about understanding the issues involved with the compiler, and chosing which data structures to use, not coding the data structures.
Building a compiler is a wonderful way to really understand what happens to your code once the compiler get a hold of it. What is the target language? When I took compilers, it took 3 of us all semester to build a compiler to go from sorta-pascal to assembly. Its not a trivial task. Its one of those things that seems simple at first, but the more you get into it, the more complicated things get.
You should be able to complete c-like language within the time frame. Assuming you are taking more than 1 course, that is exactly what you might be able to do in time. C++ is also doable but with a lot more extra hours to put it. Expecing to do c++ templates/virtual functions is overexpecting yourself and you might fail in the assignment all together. So it's better stick with a c subset compiler and finish it in time. You should also consider the time it takes for QA. If you want to be thorough QA itself will also take good time.
Namespaces or nested clases, either virtual functions are at syntax level quite simple, its just one or two more rules to parser. It is much more complicated at higher levels, at deciding, which function / class choose (name shadowing, ambiguous names between namespaces, etc.), or when compiling to bytecode/running AST. So - you may be able to write these, but if isn't necessary, skip it, and write just bare functional model.
If you are talking about a complete compiler, with code generation, then forget it. If you just intend to do the lexical & syntactic analysis side of things, then some form of templating may just about be doable in the time frame, depending on what compiler building tools you use.

What can C++ do that is too hard or messy in any other language?

I still feel C++ offers some things that can't be beaten. It's not my intention to start a flame war here, please, if you have strong opinions about not liking C++ don't vent them here. I'm interested in hearing from C++ gurus about why they stick with it.
I'm particularly interested in aspects of C++ that are little known, or underutilised.
RAII / deterministic finalization. No, garbage collection is not just as good when you're dealing with a scarce, shared resource.
Unfettered access to OS APIs.
I have stayed with C++ as it is still the highest performing general purpose language for applications that need to combine efficiency and complexity. As an example, I write real time surface modelling software for hand-held devices for the surveying industry. Given the limited resources, Java, C#, etc... just don't provide the necessary performance characteristics, whereas lower level languages like C are much slower to develop in given the weaker abstraction characteristics. The range of levels of abstraction available to a C++ developer is huge, at one extreme I can be overloading arithmetic operators such that I can say something like MaterialVolume = DesignSurface - GroundSurface while at the same time running a number of different heaps to manage the memory most efficiently for my app on a specific device. Combine this with a wealth of freely available source for solving pretty much any common problem, and you have one heck of a powerful development language.
Is C++ still the optimal development solution for most problems in most domains? Probably not, though at a pinch it can still be used for most of them. Is it still the best solution for efficient development of high performance applications? IMHO without a doubt.
Shooting oneself in the foot.
No other language offers such a creative array of tools. Pointers, multiple inheritance, templates, operator overloading and a preprocessor.
A wonderfully powerful language that also provides abundant opportunities for foot shooting.
Edit: I apologize if my lame attempt at humor has offended some. I consider C++ to be the most powerful language that I have ever used -- with abilities to code at the assembly language level when desired, and at a high level of abstraction when desired. C++ has been my primary language since the early '90s.
My answer was based on years of experience of shooting myself in the foot. At least C++ allows me to do so elegantly.
Deterministic object destruction leads to some magnificent design patterns. For instance, while RAII is not as general a technique as garbage collection, it leads to some impressive capabilities which you cannot get with GC.
C++ is also unique in that it has a Turing-complete preprocessor. This allows you to prefer (as in the opposite of defer) a lot of code tasks to compile time instead of run time. For instance, in real code you might have an assert() statement to test for a never-happen. The reality is that it will sooner or later happen... and happen at 3:00am when you're on vacation. The C++ preprocessor assert does the same test at compile time. Compile-time asserts fail between 8:00am and 5:00pm while you're sitting in front of the computer watching the code build; run-time asserts fail at 3:00am when you're asleep in Hawai'i. It's pretty easy to see the win there.
In most languages, strategy patterns are done at run-time and throw exceptions in the event of a type mismatch. In C++, strategies can be done at compile-time through the preprocessor facility and can be guaranteed typesafe.
Write inline assembly (MMX, SSE, etc.).
Deterministic object destruction. I.e. real destructors. Makes managing scarce resources easier. Allows for RAII.
Easier access to structured binary data. It's easier to cast a memory region as a struct than to parse it and copy each value into a struct.
Multiple inheritance. Not everything can be done with interfaces. Sometimes you want to inherit actual functionality too.
I think i'm just going to praise C++ for its ability to use templates to catch expressions and execute it lazily when it's needed. For those not knowing what this is about, here is an example.
Template mixins provide reuse that I haven't seen elsewhere. With them you can build up a large object with lots of behaviour as though you had written the whole thing by hand. But all these small aspects of its functionality can be reused, it's particularly great for implementing parts of an interface (or the whole thing), where you are implementing a number of interfaces. The resulting object is lightning-fast because it's all inlined.
Speed may not matter in many cases, but when you're writing component software, and users may combine components in unthought-of complicated ways to do things, the speed of inlining and C++ seems to allow much more complex structures to be created.
Absolute control over the memory layout, alignment, and access when you need it. If you're careful enough you can write some very cache-friendly programs. For multi-processor programs, you can also eliminate a lot of slow downs from cache coherence mechanisms.
(Okay, you can do this in C, assembly, and probably Fortran too. But C++ lets you write the rest of your program at a higher level.)
This will probably not be a popular answer, but I think what sets C++ apart are its compile-time capabilities, e.g. templates and #define. You can do all sorts of text manipulation on your program using these features, much of which has been abandoned in later languages in the name of simplicity. To me that's way more important than any low-level bit fiddling that's supposedly easier or faster in C++.
C#, for instance, doesn't have a real macro facility. You can't #include another file directly into the source, or use #define to manipulate the program as text. Think about any time you had to mechanically type repetitive code and you knew there was a better way. You may even have written a program to generate code for you. Well, the C++ preprocessor automates all of these things.
The "generics" facility in C# is similarly limited compared to C++ templates. C++ lets you apply the dot operator to a template type T blindly, calling (for example) methods that may not exist, and checks-for-correctness are only applied once the template is actually applied to a specific class. When that happens, if all the assumptions you made about T actually hold, then your code will compile. C# doesn't allow this... type "T" basically has to be dealt with as an Object, i.e. using only the lowest common denominator of operations available to everything (assignment, GetHashCode(), Equals()).
C# has done away with the preprocessor, and real generics, in the name of simplicity. Unfortunately, when I use C#, I find myself reaching for substitutes for these C++ constructs, which are inevitably more bloated and layered than the C++ approach. For example, I have seen programmers work around the absence of #include in several bloated ways: dynamically linking to external assemblies, re-defining constants in several locations (one file per project) or selecting constants from a database, etc.
As Ms. Crabapple from The Simpson's once said, this is "pretty lame, Milhouse."
In terms of Computer Science, these compile-time features of C++ enable things like call-by-name parameter passing, which is known to be more powerful than call-by-value and call-by-reference.
Again, this is perhaps not the popular answer- any introductory C++ text will warn you off of #define, for example. But having worked with a wide variety of languages over many years, and having given consideration to the theory behind all of this, I think that many people are giving bad advice. This seems especially to be the case in the diluted sub-field known as "IT."
Passing POD structures across processes with minimum overhead. In other words, it allows us to easily handle blobs of binary data.
C# and Java force you to put your 'main()' function in a class. I find that weird, because it dilutes the meaning of a class.
To me, a class is a category of objects in your problem domain. A program is not such an object. So there should never be a class called 'Program' in your program. This would be equivalent to a mathematical proof using a symbol to notate itself -- the proof -- alongside symbols representing mathematical objects. It'll be just weird and inconsistent.
Fortunately, unlike C# and Java, C++ allows global functions. That lets your main() function to exist outside. Therefore C++ offers a simpler, more consistent and perhaps truer implementation of the the object-oriented idiom. Hence, this is one thing C++ can do, but C# and Java cannot.
I think that operator overloading is a quite nice feature. Of course it can be very much abused (like in Boost lambda).
Tight control over system resources (esp. memory) while offering powerful abstraction mechanisms optionally. The only language I know of that can come close to C++ in this regard is Ada.
C++ provides complete control over memory and as result a makes the the flow of program execution much more predictable.
Not only can you say precisely at what time allocations and deallocations of memory occurs, you can define you own heaps, have multiple heaps for different purposes and say precisely where in memory data is allocated to. This is frequently useful when programming on embedded/real time systems, such as games consoles, cell phones, mp3 players, etc..., which:
have strict upper limits on memory that is easy to reach (constrast with a PC which just gets slower as you run out of physical memory)
frequently have non homogeneous memory layout. You may want to allocate objects of one type in one piece of physical memory, and objects of another type in another piece.
have real time programming constraints. Unexpectedly calling the garbage collector at the wrong time can be disastrous.
AFAIK, C and C++ are the only sensible option for doing this kind of thing.
Well to be quite honest, you can do just about anything if your willing to write enough code.
So to answer your question, no, there is nothing you can't do in another language that C++ can't do. It's just how much patience do you have and are you willing to devote the long sleepless nights to get it to work?
There are things that C++ wrappers make it easy to do (because they can read the header files), like Office development. But again, it's because someone wrote lots of code to "wrap" it for you in an RCW or "Runtime Callable Wrapper"
EDIT: You also realize this is a loaded question.