Related
I have built a parser using a FSM/Pushdown Automaton approach like here (and it works, well!): C++ FSM design and ownership
It allows me to exit gracefully and output a helpful error message to the user when something goes wrong at the parser stage.
I have been wondering about a good way to get that done in the rest of my program, and naturally, the parser approach popped in my mind...
I would make every object a state, which has a single event() function that has a switch statement calling object specific functions depending on the stage of execution I am. I can keep track of that with object-specific enum's, and keep the code more readable (case parser is more readable than case 5). This will allow me to close off the pushdown tree of states I have created (using the m_parent* approach in my other question).
Is this good design (forcing everything in a FSM-mode)? Is there a better way, and how much more complicated will it be (I find the FSM pretty easy to implement and test)?
Thanks for the suggestions!
PS: I know boost has about everything one may ever need, but I want to limit external dependencies, especially on boost. c++0x is ok though (but not really relevant here I think)
What you are doing is a bit like building a (simple) virtual machine in your programme. An FSM tends to be a good fit for some restricted problems such as lexing and parsing, and as you've probably noted, you can get quite a bit of logging and error management 'for free'.
However, if you try to apply the FSM pattern to everything (which is going to be tough for e.g. GUI programmes which contain quite a lot of state you normally wouldn't want to make into explicit states), you're going to realize that you also need facilities to debug your FSM (since the C++ debugger won't understand your states and events) and facilities to link and reuse states (since the states won't be OO level constructs). If you ever want to hand over your code to someone else, he or she is going to need additional training to use your FSM successfully. Are you going to want to keep one FSM engine for multiple applications? If so, how are you going to deal with versioning and upgrades?
Use the right tool for the right job. Every approach has its strengths and weaknesses. Your solution adds another layer of complexity: you can deal with logging and error handling in more C++-ish ways. If you're not happy with writing C++ code, you might consider other existing languages, rather than building an FSM language only you understand.
Most people would use inheritance instead of switch/case/default. However, the idea of forcing everything to be one way is inherently wrong. You should always approach each required functionality on it's own merits.
You can always take a look at boost.
And what's your suggestion to move to the next level of C++ programming for someone who may be called, well, an intermediate C++ programmer?
Intermediate Programmer: Understands ISO C++ reasonably well, can read and modify other's code with some luck, good with data structures and algorithms but not great
Learn C++0x
Learn what kind of assembly code gets generated for different construct types, maybe for x86
Forget language nuances and get the fundamentals -- automata theory from somewhere like Sipser or Papadimitriou
If you know OOP or at least think you do, consider how to incorporate functional programming skills with C++
Work on something on the lines of a compiler and open-source like LLVM or GNU Toolchain
The whole idea is busted -- the next level means more sophisticated data structures. So if you know AVL, consider learning left leaning red black trees et al
Now obviously nobody can do everything in this list without prioritizing, so we need some suggestion on what might be the best way forward.
NOTE: Thank you all for the very helpful responses.
I'd say you can do everything on the list, just not all at once. At least IMO, you're looking at things a bit backwards though. Learning C++ (or any other language) is a means to an end, not an end in itself.
Learning more advanced language techniques, more advanced data structures, etc., should mostly be done when and as needed to accomplish something. You certainly need a reasonable starting "base" to do much, but beyond a fairly small set of basics, most advanced techniques, data structures, etc., are also relatively specialized.
Instead of trying to learn something for it's own sake, write some code. When something seems clumsy, unnecessarily difficult, inflexible, etc., find a better way to handle it. This way, you'll not only learn the more advanced technique, data structure, etc., but also a good idea of what it really accomplishes so you'll have a decent idea of when, how, and why to use it (and just about as importantly, at least some idea of its limitations and when it's probably not applicable or useful).
To answer your specific questions:
Learn C++0x
You definitely need to do this. So possibly you have your answer right there...
Learn what kind of assembly code gets generated for different construct
types, maybe for x86
I would say learn how to understand the assembly language the compiler generates, in outline if not in detail. You certainly should not be trying to predict what the compiler will do, however.
Forget language nuances and get the fundamentals -- automata theory from
somewhere like Sipser or Papadimitriou
If it turns you on, I suppose...
If you know OOP or at least think you do, consider how to incorporate
functional programming skills with C++
Of all of the paradigms C++ supports, functional programming is probably the worst supported - if you want to learn FP, learn Haskell (or whatever), not C++.
Work on something on the lines of a compiler and open-source like LLVM or
GNU Toolchain
GNU is written in C, so it's not likely to boost your C++ skills - I know little about LLVM.
The whole idea is busted -- the next level means more sophisticated
data structures. So if you know AVL,
consider learning left leaning red
black trees et al
RB trees are not much more sophisticated than AVL trees - same basic concept. If you understand the basic structures covered in a data structures textbook, I don't see the need to dig further, unless the subject particularly interests you.
I'd learn about BOOST.
You can start piecemeal, just by using it, and as you get deeper into the libraries, you will find yourself thinking "How does that work?".
Using it will make you a more
productive and better C++ programmer!
Understanding how it works will get
you a "guru" badge!!
Contributing to and extending it will
ensure immortality!!!
If you know the basic language:
Then in this sort of order (though there will be some back tracking)
Learn study and digest RAII
Figure out how to use RAII in all C contexts so you are never stuck with C code.
Figure out Exceptions and what the exception guarantees are.
Figure out how to implement methods so that each of the different types of guarantees holds.
Learn about the standard containers.
Learn about the requirements required of each container.
Learn about iterators
Learn about iterator traits and how they work in conjunction with pointers.
Learn about the algorithms library
Learn about the stream library
Go back and learn how streams and iterators work
Learn about the method pointers and how they can be used in algorithms
Figure out what a functor is and how to use it.
Learn about bind and look at boost bind
Learn about the boost containers and how they differ from the standard containers.
Learn about smart pointers.
What are the different types and when to use each one effectively.
Start reading about the other things available in boost.
At this point you will be at the start of learning how to use C++
Learning assembly (e.g. to write assembly) might be a good idea, but I strongly suggest you don't become attached to the particulars of what your compiler generates, as that will change from version to version and optimization level to optimization level.
I would be a strong proponent of #4. Learning functional programming is very valuable. I haven't done a whole lot of it in C++, so I don't know how natural a fit it is, but I love how Ruby and Scala do functional programming.
I suggest you go into the designing part of programming. Learn how to design, write good code, learn good programming practices. Design patterns, UML, unit tests belong here.
As one hardly does the same thing all the time I also recommend, as you said, the assembly language. Learning assembly is fun and it really makes you understand computers better. Nothing beats the feeling of knowing how computers work at the lowest level.
Having knowledge of both low and high level programming beats everything else.
Don't worry too much about C++0x right now... make sure you really really really understand the basics first. This means make sure you understand references, pointers, L-values, R-values, templates, inheritance, memory management, etc etc. I'm not just saying grab a basic understanding of these, I'm saying really know the C++ memory model and what each expression means.
I really like #4 and #6. In regards to #6, try coding up some really advanced data structs in C++. Nothing will make you learn the language faster then trying to solve some problems that advanced data structs entail.
I'd say the next step is to read Structure and Interpretation of Computer Programs from cover to cover and do the exercises.
Study how other people solve difficult problems in an elegant way. Very important: just practice, without forgetting to evaluate. Have your code or problem solving methods reviewed.
Yes (referring to point 4), learn other programming languages, especially those that have specific advantages over c++, rather than applying their techniques directly in C++. Focus on finding methods for yourself to code with as few errors disrupting your workflow as possible, find a calibrated systematic and abstract approach that you can always apply to problem solving and implementation.
Collect/build a set of tools/libraries and coding practices that allow you to stop inventing the wheel, but deal with all the most common tasks in the best ways. Cause if you think about it, appart from bugs, readability, scalability and extensibility and in much lesser degree performance, if you write code that gets the job done, you have actually shown yourself a good programmer.
I aim at productivity here. If you are more into theory, you might as well just occupy yourself with little snippets and obscure language features.
For another point of view: 7. Learn other programming languages, as different from C++ as is practical. Definitely learn about functional programming, and don't worry about how it applies to C++ yet. Some languages to consider: Scheme, Common Lisp, Haskell, Prolog, Forth, Smalltalk. You don't have to become proficient in them, but you should try to understand how they work and what's good code in those languages.
If all you know is one language, your thinking will be restricted to what's natural in that language. If you know more than one, you'll be able to think in more different ways. You'll be more flexible in your approach to problems.
Anything you mentioned above will make you become a better C++ programmer. You need to make a choice based on your career plan. For example, if you want to develop hardware driver with C++, you should learn assembly code generated.
Background:
I have been working on a platformer game written in C++ for a few months. The game is currently written entirely in C++, though I am intrigued by the possibility of using Lua for enemy AI and possibly some other logic. However, the project was designed without Lua in mind, and I have already written working C++ code for much of the AI. I am hoping Lua can improve the extensibility of the game, but don't know if it would make sense to convert existing C++ code into Lua.
The question:
When, if ever, is it appropriate to take fully functional C++ code and refactor it into a scripting language like Lua?
The question is intentionally a bit vague, so feel free give answers that are not relevant to the given background.
Scripting languages are useful for things that might change frequently or be extended, and can afford the trade from speed.
It wouldn't make sense to use a scripting language in your core libraries, because those are relatively static (all they do is process stuff over and over) and need to be quick. But for things like AI, it's a perfect idea. You may tweak the AI without recompiling, and allow future changes quite nicely. Once you ship, you can pre-compile the scripting language and call it good.
It's also best for extensibility. Provide a Lua interface to your game and anybody can write plugins using a simple language, without the need for compiling. The more fleshed out your Lua interface, the more expressive and powerful those plugins can be.
If you've already got everything working, unless you intend on trying to improve it or allow extensions I don't really see a reason to strip it out; you're done. It would be something to keep in mind for your next game engine.
That said, if you're not completely done and this is a hobby/practice kind of thing, I would recommend you do. It will be your introduction into adding scripting capabilities to the game engine. When you get to making larger and more complex engines you won't need to worry about something new.
When, if ever, is it appropriate to take fully functional C++ code and refactor it into a scripting language like Lua?
Rarely. Here's when I've done it:
I wanted to change the design or add functionality in ways that would require me to revisit the C++ code anyway.
I found parts of the C++ code that I kept changing over and over.
I believed that by migrating from C++ to Lua that I could make the code five or ten times smaller.
The first two bullets are things anyone can do. The third requires some experience.
I am working on a project written in C++ which involves modification of existing code. The code uses object oriented principles(design patterns) heavily and also complicated stuff like smart pointers.
While trying to understand the code using gdb,I had to be very careful about the various polymorphic functions being called by the various subclasses.
Everyone knows that the intent of using design patterns and other complicated stuff in your code is to make it more reusable i.e maintainable but I personally feel that, it is much easier to understand and debug a procedure oriented code as you definitely know which function will actually be called.
Any insights or tips to handle such situations is greatly appreciated.
P.S: I am relatively less experienced with OOP and large projects.
gdb is not a tool for understanding code, it is a low-level debugging tool. Especially when using C++ as a higher level language on a larger project, it's not going to be easy to get the big picture from stepping through code in a debugger.
If you consider smart pointers and design patterns to be 'complicated stuff' then I respectfully suggest that you study their use until they don't seem complicated. They should be used to make things simpler, not more complex.
While procedural code may be simple to understand in the small, using object oriented design principals can provide the abstractions required to build a very large project without it turning into unmaintainable spaghetti.
For large projects, reading code is a much more important skill than operating a debugger. If a function is operating on a polymorphic base class then you need to read the code and understand what abstract operations it is performing. If there is an issue with a derived class' behaviour, then you need to look at the overrides to see if these are consistent with the base class contract.
If and only if you have a specific question about a specific circumstance that the debugger can answer should you step through code in a debugger. Questions might be something like 'Is this overriden function being called?'. This can be answered by putting a breakpoint in the overriden function and stepping over the call which you believe should be calling the overriden function to see if the breakpoint is hit.
Port it into Doxygen as a first step.
Modifying comments should have no effect on the code.
Doxygen will allow you to get an overview of the structure of the program.
Over time, as you figure out more about the program, you add comments that get picked up by Doxygen. The quality of the document grows over time, and will be helpful to the next poor SOB that gets stuck with the program
There is an excellent book called Object-Oriented Reengineering Patterns that, in a first part, provides patterns on how to understand legacy code (e.g. "refactor to understand").
A pdf version of the book is available for free at http://scg.unibe.ch/download/oorp/
Diagrams. Does your IDE have a tool that can reverse-engineer class diagrams from the code? That may help you understand the relationships between classes. Also, have the other developers actually written documentation on what they are doing and why? Is there a decisions document explaining why they designed and built in the way they did (Ok, sometimes this is not necessary - but if it exists, it would also help).
Also, do you know WHAT design patterns were used? Do they have names? Can you look them up and find other simpler examples of them? Maybe try writing a small app that also implements the design pattern, just to try it for yourself. That can also improve understanding.
I generally do the following:
Draw a simplified class diagram
Write some pseudocode
Ask a developer who is likely to be familiar with the code layout
How would you go about converting a reasonably large (>300K), fairly mature C codebase to C++?
The kind of C I have in mind is split into files roughly corresponding to modules (i.e. less granular than a typical OO class-based decomposition), using internal linkage in lieu private functions and data, and external linkage for public functions and data. Global variables are used extensively for communication between the modules. There is a very extensive integration test suite available, but no unit (i.e. module) level tests.
I have in mind a general strategy:
Compile everything in C++'s C subset and get that working.
Convert modules into huge classes, so that all the cross-references are scoped by a class name, but leaving all functions and data as static members, and get that working.
Convert huge classes into instances with appropriate constructors and initialized cross-references; replace static member accesses with indirect accesses as appropriate; and get that working.
Now, approach the project as an ill-factored OO application, and write unit tests where dependencies are tractable, and decompose into separate classes where they are not; the goal here would be to move from one working program to another at each transformation.
Obviously, this would be quite a bit of work. Are there any case studies / war stories out there on this kind of translation? Alternative strategies? Other useful advice?
Note 1: the program is a compiler, and probably millions of other programs rely on its behaviour not changing, so wholesale rewriting is pretty much not an option.
Note 2: the source is nearly 20 years old, and has perhaps 30% code churn (lines modified + added / previous total lines) per year. It is heavily maintained and extended, in other words. Thus, one of the goals would be to increase mantainability.
[For the sake of the question, assume that translation into C++ is mandatory, and that leaving it in C is not an option. The point of adding this condition is to weed out the "leave it in C" answers.]
Having just started on pretty much the same thing a few months ago (on a ten-year-old commercial project, originally written with the "C++ is nothing but C with smart structs" philosophy), I would suggest using the same strategy you'd use to eat an elephant: take it one bite at a time. :-)
As much as possible, split it up into stages that can be done with minimal effects on other parts. Building a facade system, as Federico Ramponi suggested, is a good start -- once everything has a C++ facade and is communicating through it, you can change the internals of the modules with fair certainty that they can't affect anything outside them.
We already had a partial C++ interface system in place (due to previous smaller refactoring efforts), so this approach wasn't difficult in our case. Once we had everything communicating as C++ objects (which took a few weeks, working on a completely separate source-code branch and integrating all changes to the main branch as they were approved), it was very seldom that we couldn't compile a totally working version before we left for the day.
The change-over isn't complete yet -- we've paused twice for interim releases (we aim for a point-release every few weeks), but it's well on the way, and no customer has complained about any problems. Our QA people have only found one problem that I recall, too. :-)
What about:
Compiling everything in C++'s C subset and get that working, and
Implementing a set of facades leaving the C code unaltered?
Why is "translation into C++ mandatory"? You can wrap the C code without the pain of converting it into huge classes and so on.
Your application has lots of folks working on it, and a need to not-be-broken.
If you are serious about large scale conversion to an OO style, what
you need is massive transformation tools to automate the work.
The basic idea is to designate groups of data as classes, and then
get the tool to refactor the code to move that data into classes,
move functions on just that data into those classes,
and revise all accesses to that data to calls on the classes.
You can do an automated preanalysis to form statistic clusters to get some ideas,
but you'll still need an applicaiton aware engineer to decide what
data elements should be grouped.
A tool that is capable of doing this task is our DMS Software Reengineering
Toolkit.
DMS has strong C parsers for reading your code, captures the C code
as compiler abstract syntax trees, (and unlike a conventional compiler)
can compute flow analyses across your entire 300K SLOC.
DMS has a C++ front end that can be used as the "back" end;
one writes transformations that map C syntax to C++ syntax.
A major C++ reengineering task on a large avionics system gives
some idea of what using DMS for this kind of activity is like.
See technical papers at
www.semdesigns.com/Products/DMS/DMSToolkit.html,
specifically
Re-engineering C++ Component Models Via Automatic Program Transformation
This process is not for the faint of heart. But than anybody
that would consider manual refactoring of a large application
is already not afraid of hard work.
Yes, I'm associated with the company, being its chief architect.
I would write C++ classes over the C interface. Not touching the C code will decrease the chance of messing up and quicken the process significantly.
Once you have your C++ interface up; then it is a trivial task of copy+pasting the code into your classes. As you mentioned - during this step it is vital to do unit testing.
GCC is currently in midtransition to C++ from C. They started by moving everything into the common subset of C and C++, obviously. As they did so, they added warnings to GCC for everything they found, found under -Wc++-compat. That should get you on the first part of your journey.
For the latter parts, once you actually have everything compiling with a C++ compiler, I would focus on replacing things that have idiomatic C++ counterparts. For example, if you're using lists, maps, sets, bitvectors, hashtables, etc, which are defined using C macros, you will likely gain a lot by moving these to C++. Likewise with OO, you'll likely find benefits where you are already using a C OO idiom (like struct inheritence), and where C++ will afford greater clarity and better type checking on your code.
Your list looks okay except I would suggest reviewing the test suite first and trying to get that as tight as possible before doing any coding.
Let's throw another stupid idea:
Compile everything in C++'s C subset and get that working.
Start with a module, convert it in a huge class, then in an instance, and build a C interface (identical to the one you started from) out of that instance. Let the remaining C code work with that C interface.
Refactor as needed, growing the OO subsystem out of C code one module at a time, and drop parts of the C interface when they become useless.
Probably two things to consider besides how you want to start are on what you want to focus, and where you want to stop.
You state that there is a large code churn, this may be a key to focus your efforts. I suggest you pick the parts of your code where a lot of maintenance is needed, the mature/stable parts are apparently working well enough, so it is better to leave them as they are, except probably for some window dressing with facades etc.
Where you want to stop depends on what the reason is for wanting to convert to C++. This can hardly be a goal in itself. If it is due to some 3rd party dependency, focus your efforts on the interface to that component.
The software I work on is a huge, old code base which has been 'converted' from C to C++ years ago now. I think it was because the GUI was converted to Qt. Even now it still mostly looks like a C program with classes. Breaking the dependencies caused by public data members, and refactoring the huge classes with procedural monster methods into smaller methods and classes never has really taken off, I think for the following reasons:
There is no need to change code that is working and that does not need to be enhanced. Doing so introduces new bugs without adding functionality, and end users don't appreciate that;
It is very, very hard to do refactor reliably. Many pieces of code are so large and also so vital that people hardly dare touching it. We have a fairly extensive suite of functional tests, but sufficient code coverage information is hard to get. As a result, it is difficult to establish whether there are already sufficient tests in place to detect problems during refactoring;
The ROI is difficult to establish. The end user will not benefit from refactoring, so it must be in reduced maintenance cost, which will increase initially because by refactoring you introduce new bugs in mature, i.e. fairly bug-free code. And the refactoring itself will be costly as well ...
NB. I suppose you know the "Working effectively with Legacy code" book?
You mention that your tool is a compiler, and that: "Actually, pattern matching, not just type matching, in the multiple dispatch would be even better".
You might want to take a look at maketea. It provides pattern matching for ASTs, as well as the AST definition from an abstract grammar, and visitors, tranformers, etc.
If you have a small or academic project (say, less than 10,000 lines), a rewrite is probably your best option. You can factor it however you want, and it won't take too much time.
If you have a real-world application, I'd suggest getting it to compile as C++ (which usually means primarily fixing up function prototypes and the like), then work on refactoring and OO wrapping. Of course, I don't subscribe to the philosophy that code needs to be OO structured in order to be acceptable C++ code. I'd do a piece-by-piece conversion, rewriting and refactoring as you need to (for functionality or for incorporating unit testing).
Here's what I would do:
Since the code is 20 years old, scrap down the parser/syntax analyzer and replace it with one of the newer lex/yacc/bison(or anything similar) etc based C++ code, much more maintainable and easier to understand. Faster to develop too if you have a BNF handy.
Once this is retrofitted to the old code, start wrapping modules into classes. Replace global/shared variables with interfaces.
Now what you have will be a compiler in C++ (not quite though).
Draw a class diagram of all the classes in your system, and see how they are communicating.
Draw another one using the same classes and see how they ought to communicate.
Refactor the code to transform the first diagram to the second. (this might be messy and tricky)
Remember to use C++ code for all new code added.
If you have some time left, try replacing data structures one by one to use the more standardized STL or Boost.