How essential is polymorphism for writing a text editor? - c++

Many years ago when I didn't know much about object oriented design I heard one guy said something like "How can you write a text editor without polymorphism?" I didn't know much about OOP and so I couldn't judge how wise that though was or ask any specific questions at that time.
Now, after many years of software development (mostly C++), I've used polymorphism many times to solve various problems when designing software. Yet I've never created text editors. So I still can't evaluate that guy's idea.
Is using polymorphism so essential for implementing a text editor in object-oriented languages and why?

Polymorphism for writing a text editor is by no means essential. In fact, polymorphism for solving any programming problem is not essential. It's just one way to do it. Sometimes it makes solving certain kinds of problems easier, and sometimes it just gets in the way.
The evidence for this is that there are perfectly usable text editors developed long before "OOP" became popular.

I would say "no", because it's entirely possible to write perfectly good text editors in non-object-oriented languages, so it can't be that essential.
Polymorphism is a great technique for the problems it addresses, but it's by no means the golden hammer for everything that ails a software developer.

This is a term that was thrown around a lot when OO programming was the rage. This guy was probably trying to intimidate you with large words, I doubt if he fully understood what he was saying although it is a simple concept when explained.
Any here lies the crux of the argument - how many times would you have to write, maintain or extend a text editor - none - therefore imho an OO paradigm is of little use in a for what is a relatively simple piece of code that needs to be highly efficient.

Many of the design patterns like Memento, Flyweight etc that may be used to design/implement Text Editor require inheritance and polymorphism.

The other points about polymorphism as being just a tool are spot on.
However if "the guy" did have some experience with writing text editors he may well have been talking about using polymorphism in the implementation of a document composition hierarchy.
Basically this is just a tree of objects that represent the structure of your document including details such as formatting (bold, italic etc) coloring and so on.
(Most web browsers implement something similar in the form of the browser Document Object Model (DOM), although there is certainly no requirement that they use polymorphism.)
Each of these objects inherits from a common base class (often abstract) that defines a method such as Compose().
Then when it is time to display or to update the structure of the document, the code simply traverses the tree calling the concrete Compose() on each object. Each object is then repsonsible for composing and rendering itself at the appropriate location in the document.
This is a classic use of polymorphism because it allows new document "components" to be added (or changed) without any (or minimal) change to the main application code.
Once again though, there are many ways to build a text manipulation program, polymorphism is definitely not required to build one.

I once wrote a text editor in Basic. It wasn't a sophisticated text editor by any means, it's big highlight being a text-mode windowing thing used for some menus and dialogs, but it still did it's job at the time - ie it proved I could write a text editor in Basic. I even used it sometimes. I won't be showing the source in public - it's just too embarrassing!
When your text editor is mostly just inserting/deleting characters in a big array of strings and displaying them, little or no abstraction is needed other than the usual provided-as-standard abstractions of arrays and strings.
On the other hand, the amount of text that a text editor on a PC is expected to cope with has increased a lot over the last 20 years, sometimes to the extent that even a modern PC with multiple Gigabytes may not be able to keep the whole file in RAM. On top of that there are character set and encoding issues. A good text editor is expected to remember a (potentially large) number of bookmarks into multiple files, and to maintain them so they refer to the same point despite edits. And then there's syntax highlighting, the ability to record/playback macros, and more.
In short, modern text editors are much more complex than the things used in DOS and on other micros twenty years ago. That complexity is no doubt much easier to manage with a good toolkit for handling abstractions.

While a simple text editor (below edit.com from MS-DOS) may be realized easier in a static class only (because the functionality is very limited), as soon as you get to menus and dialogs, you will find yourself in dire need of object oriented language features.
Personally, I frown upon procedural code anyway - I prefer a mixture of OOP (program structure, separation of functionality, etc...) and functional programming (implementation).
This may sound like a religious argument of some sort, but I find my personal style quite recommendable. Usually I need far less lines of code (which are much easier to understand) than most of the developers I work with and my code feels much more "agile" and "flexible".
Try it. :-)
Oh - and polymorphy is not hard to understand. Simply imagine that you (as a person) can be handled as:
a) Man or woman
b) European, asian, american, african, oceanian (I hope this is right), etc...
c) By your name
d) By your occupation
But still you are a person - and a living being, and a part of the universe... and YOU.
So for someone who asks you for statistical reasons a few questions, you may be handled as a, say, woman from oceania (I don't know where you come from, but lets just assume) who is, hm, 42 years old and lived in, hm, Switzerland for 23 years (hahaha).
For your employer, you may be competent in programming and talking to your collegues.
However, HOW you fill those roles is dependent on your implementation. This is you.

Is using polymorphism so essential for implementing a text editor in object-oriented languages and why?
Depends on what kind of text editor you're talking about.
You can write notepad without OOP. But you most likely will need OOP for something like MS Word or OpenOffice.
Design Patterns: Elements of Reusable Object-Oriented Software uses text editor for examples (i.e. "case study") of Design Pattern application. You may want to check out the book.

Related

Cleanest data structure to use when interpreting data from neatly-structured user commands (in C++) [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I would like to write a simple in-house program that parses user commands written in a language of our team's own invention (but based closely on another program we are already familiar with). The command parser that I am working on now will simply be the UI through which the user can run the other algorithms I have already written. (Those other algorithms, by the way, are used to generate the input files for a molecular dynamic simulation package called LAMMPS.) The only thing I really have left to do is just write this UI, but as it turns out, writing your own scripting language is almost an intractable challenge for a non software engineer to tackle on his own.
According to the answers I received, what I am try to make would be considered a Domain Specific Language, and it is not advisable to try to make one's own DSL due to the enormous amount of work required to make it useful and bug-free.
The best option then would actually be to use an existing scripting language like Lua or Python, and embed it in the program.
To do this, I will most likely use Lua because it seems most fitting for our needs. So at this point, the rest of this question is no longer relevant since the answer would be: "Don't do it yourself." But I'm still going to keep part of it here for other users to be able read and learn from the wonderful answers below.
Thanks again to everyone who replied!
Old Question:
I would like to write a program that parses a user text input and then
runs a function corresponding to that input. To do this I would need
to parse the string for relevant keywords. I believe there will be
less than 15 keywords when I'm done, so ideally I'd like this code
to be simple and short.
The problem is that I am currently using if-statements to parse the
strings. This is an extremely inconvenient way to parse commands
because even for a short 3 word commands the code explodes into nested-ifs
3 layers deep. So longer 8+ word sentences will become nested-ifs more than
8 layers deep.
This kind of programing approach quickly becomes unmanageable, especially
when I need to make any significant changes to a command.
My question is whether or not there exists a data structure in C++ that
can help me better manage my giant nested-ifs, or if anyone could suggest
a better way to parse a string for lots of different data types (i.e.
substings, ints, and floats) and output an error message when the expected
type is not found?
Here is an example of a short user session to show the kinds of commands
I would like to interpret:
load "Basis.Silicon" as material 1
add material 1 to layer 1
rotate layer 1 about x-axis by 45 degrees
translate layer 1 in x-axis by 10 nm
generate crystal
These commands are based on an already-existing program that our team
uses, but unfortunately the source code for this program has never been
publicly released so I am left guessing as to how it was actually
implemented.
One final note, unlike natural language processors, I know exactly what
the format of each line will be. So my issue isn't so much how to interpret
the text, but rather how to code the logic in a concise and manageable way.
Thanks everyone!
Your question is not clear. And your goals are more difficult than what you believe.
Either you consider that you want to somehow process human language sentences (e.g. in English). Then you want to study natural language processing, and you can find some libraries related to that field.
Or you consider that you want to interpret some formal programming or scripting language. Then you want to study interpreters and compilers. BTW, in that case, you might just embed an existing interpreter (like Lua, Guile, Python, etc....) in your program.
You could also think in terms of expert systems with a knowledge base made of rules (this approach could be viewed as in the middle between NLP and scripting language) You'll then need some inference engine (perhaps CLIPS). See also J.Pitrat's blog.
Notice that even coding a simple interpreter is more difficult than you believe. You absolutely need to represent abstract syntax trees, which you construct from textual input with a parsing phase.
BTW, All of NLP, expert systems, and interpreter design and implementation are difficult fields. You could get a PhD in all 3 fields (but you have to choose which).
If you go the embedded interpreter way: study the interpreters I mentioned (Guile, Lua, Python, Neko, etc...) and choose which one you want, to embed.
If for whatever reason, you want to make an interpreter from scratch: Learn several programming languages first (including scripting languages like Ruby, Python, Ocaml, Scheme, Lua, Neko, ...). Read books on Programming Language Pragmatics (by M.Scott) and Lisp In Small Pieces (by Queinnec). Read also text books on compilation and parsing, and on Garbage Collection and formal (e.g. denotational) semantics. All this may need a dozen years of work.
Notice that by experience embedding a software in an interpreter is a very structuring design. If you did not thought of that at the beginning you probably need to redesign and refactor a lot your existing application. For instance, when embedding a software in an interpreter, you cannot afford that bad input crashes the program. So error handling and memory management (interfacing to the GC of the interpreter) is challenging and gives new constraints. Hence you'll need to re-think your application.
If all this is new (and even if you don't choose e.g. Guile as the embedding interpreter): learn and practice a bit of Scheme -e.g. with Guile or PltScheme- (e.g. reading SICP), read a little bit about λ-calculus and closures, then read Queinnec's Lisp In Small Pieces book. Remember the halting problem (which is partly why interpreters are difficult to code).
BTW the syntax you are proposing (e.g. rotate mat 1 by x 90) is not very readable and looks COBOL-like. If possible, have a language which looks familiar to existing ones. Make it easy to read !
Start by reading all the wikipages I am referencing here.
FWIW, I am the main author of MELT, a domain specific language (inspired a lot by Scheme) to extend the GCC compiler. Some of the papers / documentations I wrote might inspire you (and contain valuable references).
Addenda (after question was reformulated)
You seems to invent some formal syntax like
add material 1 to layer 1
rotate layer 1 about x-axis by 90 degrees
translate layer 1 in x-axis by 10 inches
I can't guess what kind of language is it? Are you implementing a 3D printer? If yes, you should stick to some existing standard formal language in that domain.
I believe that such a COBOL-like syntax is really wrong. The point is that it is too verbose, and that you are wishing to implement some domain specific language. I find your example very bad-looking.
Is that syntax your invention, or is there some document specifying (and many thousands already existing lines coded in) your domain specific language. If you are just inventing it, please reconsider the syntax and the semantics.
First, you need to specify on paper the full syntax and semantics of your DSL.
Is your DSL Turing complete? (I guess that yes, because Turing completeness is reached very quickly - e.g. with variables and loops....). If yes, you are inventing a scripting language. Please don't invent scripting language without knowing several programming & scripting languages (then read Programming Language Pragmatics...). The point is that, if your scripting language will become successful, advanced users will soon or later write important programs in it (e.g. many thousand lines). Then, these advanced users will be programmers. In that case, it is very important (for social & economic reasons) to have a DSL well founded and looking familiar (if possible, an extension of some existing scripting language).
If your DSL already exists, stick to its specification on paper. If that specification is not good enough, improve it with formalization (e.g. by writing some BNF syntax, and some formal (e.g. denotational) semantics for it). Publish and discuss that formalization with existing users.
Several industries got some ad-hoc DSLs which became widely used but was ill designed
(e.g., in the French nuclear industry, the Gibiane DSL designed in the 1970s by nuclear physicists, not computer scientists; the US Boeing corporation is also rumored to have made similar mistakes). Then, maintaining and improving the many hundred thousands lines of DSL scripts is becoming a nightmare (and may means losing millions of dollars or euros). So you better stick to some existing scripting language. The advantages are that there exist some culture on it (e.g. you can find dozens of books on Python or Lua, and many trained engineers familiar with them), that the interpreter is widely used and tested, that the community working on them is improving the interpreters, so it has quite few uncorrected bugs.
You should not attempt to design and implement your own DSL if you are not a trained computer scientist. Stick to some existing scripting language (of course their syntax is not like you want it to be), and leverage on existing implementations and experiment.
As a counter-example, J.Ousterhout has invented the widely used Tcl scripting language, with the claim that scripts are always small (e.g. hundreds of line only) and won't grow to big code base; unfortunately, some of them did, and Tcl is known as a bad language to code many dozens of thousands of lines (even if Tcl is an easy and convenient language for tiny scripts). The moral of the story is that if a (turing complete) scripting language is becoming successful, some "crazy" advanced user will code hundred of thousands of script code. So you need that scripting language to be well designed from the start. Hence, you should adopt and adapt a good existing scripting language (and avoid inventing an unfamiliar syntax without having a good knowledge of several existing scripting languages)
later additions
PS: my criticism of Tcl is not entirely subjective: the point is that Tcl was designed for small scripts in mind (read J.Ousterhout's first papers about Tcl), but my point is that when you offer a Turing-complete scripting language, some "crazy" user will eventually write huge scripts for it. Hence, you need to anticipate such "crazy" usage by offering a scripting language which "scales up" to big scripts, so is built according to software engineering practices for large software code base.
NB. Lua is probably a good choice as a language to embed. It is small, has a nice implementation, is well documented, and has good performance. But be careful about memory management issues (and this advice holds for any scripting language).
EDIT: To be more clear, I would like to have a short list of key words
(<15). The order/presence of which would determine which function will
be run.
You can build a small ruleset engine (e.g. something that processes lists of words). You write that engine/function once and just pass the data structures to it.
As an alternative, a solution using regular expressions would be probably the fastest to code (the engine is ready for you), assuming you're familiar with the regexp syntax (if not, it's still a good investment).
You could build a table of keywords and function pointers:
typedef void (*Function_Pointer)(void);
struct table_entry
{
const char * keyword;
Function_Pointer p_function;
};
table_entry function_table[] =
{
{"car", Process_Car},
{"bike", Process_Bike},
};
Search the table for a keyword. If the keyword is found, dereference the function pointer.
The following snippet will execute the function for processing the word "car":
(function_table[0].p_function)();
There is a famous program, called Eliza, which parses sentences for keywords.
Examples can be found at: Eliza C++ examples

How to read code without any struggle

I am a new to professional development. I mean I have only 5 months of professional development experience. Before that I have studied it by myself or at university. So I was looking over questions and found here a question about code quality. And I got a question related to it myself. How do I increase my code understanding/reading skills? Also will it improve the code quality I will write? Is there better code notation than Hungarian one? And is there any really good books for C++ design patterns(or the language doesn't matter?)?
Thank you in advance answering these questions and helping me improving :)
P.S. - Also I have forgot to tell you that I am developing with C++ and C# languages.
There is only way I've found to get better at reading other peoples code and that is read other peoples code, when you find a method or language construct you don't understand look it up and play with it until you understand what is going on.
Hungarian notation is terrible, very few people use it today, it's more of an in-joke among programmers.
In fact the name hungarian notation is a joke itself as:
"The term Hungarian notation is
memorable for many people because the
strings of unpronounceable consonants
vaguely resemble the consonant-rich
orthography of some Eastern European
languages."
From How To Write Unmaintainable Code
"Hungarian Notation is the tactical
nuclear weapon of source code
obfuscation techniques; use it! Due to
the sheer volume of source code
contaminated by this idiom nothing can
kill a maintenance engineer faster
than a well planned Hungarian Notation
attack."
And the ever popular linus has a few words to say on the matter.
"Encoding the type of a function into
the name (so-called Hungarian
notation) is brain damaged—the
compiler knows the types anyway and
can check those, and it only confuses
the programmer."
- Linus Torvalds
EDIT:
Taken from a comment by Tobias Langner.
"For the differences between Apss Hungarian and Systems Hungarian see Joel on Software".
Joel on Software has tips on how to read other people code called Reading Code is Like Reading the Talmud.
How do I increase my code
understanding/reading skills?
Read read read. Learn from your mistakes. Review answers on SO and elsewhere. When you can think back on a piece of code you wrote and go "aha! I should've done xyz instead!" then you're learning. Read a good book for your language of choice, get beyond the basics and understand more advanced concepts.
Then, apart from reading: write write write! Coding is like math: you won't fully grock it without actually solving problems. Glancing at a math problem's solution is different than getting out a blank piece of paper and solving it yourself.
If you can, do some pair programming too to see how others code and bounce ideas around.
Also will it improve the code quality
I will write?
See above. As you progress you should get more efficient. It won't happen by reading a book on design patterns. It will happen by solving real world problems and understanding why what you read works.
Is there better code notation than
Hungarian one?
It depends. Generally I avoid them and use descriptive names. The one exception where I might use Hungarian type of notations is for UI elements such as Windows Forms or ASP.NET controls, for example: using btn as a prefix for a Submit button (btnSubmit), txt for a TextBox (txtFirstName), and so on but it differs from project to project depending on approach and patterns utilized.
With regards to UI elements, some people like to keep things alphabetical and may append the control type at the end, so the previous examples become submitButton and firstNameTextBox, respectively. In Windows Forms many people name forms as frmMain, which is Hungarian, while others prefer naming it based on the application name or form purpose, such as MainForm, ReportForm, etc.
EDIT: be sure to check out the difference between Apps Hungarian and Systems Hungarian as mentioned by #Tobias Langner in a comment to an earlier response.
Pascal Case is generally used for method names, classes, and properties, where the first letter of each word is capitalized. For local variables Camel Case is typically used, where the first letter of the first word is lowercase and subsequent words have their first letters capitalized.
You can check out the naming conventions and more from the .NET Framework Design Guidelines. There is a book and some of it is on MSDN.
And is there any really good books for
C++ design patterns(or the language
doesn't matter?)?
Design patterns should be applicable to any language. Once you understand the concept and the reasoning behind that pattern's usefulness you should be able to apply it in your language of choice. Of course, don't approach everything with a "written in stone" attitude; the pattern is the goal, the implementation might differ slightly between languages depending on language features available to you. Take the Decorator pattern for example, and see how C# extension methods allow it to be implemented differently than without it.
Design Pattern books:
Head First Design Patterns - good beginner intro using Java but code is available for C++ and C# as a download (see "book code and downloads" section on the book's site)
Design Patterns: Elements of Reusable Object-Oriented Software - classic gang of four (GOF)
Patterns of Enterprise Application Architecture - Martin Fowler
If you're looking for best practices for quality coding in C++ and C# then look for the "Effective C++" and "More Effective C++" books (by Scott Meyers) and "Effective C#" and "More Effective C#" books (by Bill Wagner). They won't hold your hand along the way though, so you should have an understanding of the language in general. There are other books in the "Effective" series so make sure you see what's available for your languages.
I'm sure you can do a search here for other recommended reading so I'll stop here.
EDIT: added more details under the Hungarian Notation question.
I can't speak for everyone else, but in my experience I've found that the best thing I learned about making readable and/or in general better code was reading (and ultimately cleaning) a lot of other people's code. Some people may disagree with me but I think it's invaluable. Here's my reasoning:
When you start programming, its difficult to determine what is crap vs. not crap vs. good. Being logical, rational and extremely intelligent help in making good code, but even those factors don't always contribute. By reading others works and doing the dirty work, you'll have gone through the experience of what works and what doesn't. Eventually, you'll be able to mentally navigate those minefields that others had to cross and you'll be prepared to avoid those identical minefields.
By reading other's works, you gain insight into their mind and how they tackle a problem. Just from an architecture or technique aspect, this can be very useful to you whether their
tactics were good or bad. By reading other peoples successful or unsuccessful implementation, you've gained that knowledge without putting in the actual time it took them to learn it.
Design patterns are extremely useful. Only time and experience with them will help you in knowing what the appropriate pattern for whichever problem. Again, read other peoples' code for this if they've successfully built some pattern that may be useful for you.
When dealing with extreme problems where people's work falls short, you learn to research and dive into the internals of whatever system/language/platform/framework you're working with. This research ability on your own is very useful when all else fails. But you'll never know when to start looking or for what until you get through the crud of other people's work. Good code or bad, it's all valuable in some form or fashion.
All these notations and formats and nomenclature are helpful, but can be learned or implemented rather quickly and their payoff is fairly substantial. By reading code from other people, you'll develop your own style of logic. As you encounter other peoples work and the tremendous amount of effort it takes to read through, you'll learn what logical pitfalls to avoid and what to implement the next time for yourself or even how to fix bad code even faster.
I've never felt as if I was a great programmer. Not to say I'm a bad one either, but I feel confident in my abilities as my experience has taught me so much and my ability to adapt to every situation is what makes me a solid programmer. Learning from other people and their code has helped me. Whether their work was good or bad, there's always something you can take from them and their experience, add it to your memories, knowledge, etc.etc.
Ask other people to read your code! Try and see if you get a fellow coworker or similar to have a code review with you. Having someone else comb through your code and ask you questions about your code will provide new insights and critiques to your style and techniques. Learn from your mistakes and others.
Just to give you a bit of encouragement, I've been a professional programmer for 30 years now, and I still find reading other people's code very difficult. The major reason for this, unfortunately, is that the quality of the code follows Sturgeons Law - 90% of it is crap. So don't think it's your fault if you find it hard going!
The biggest improvement in readability of my code came about when I started liberally using white space.
I found this article on Joel on Software to be very relevant to the Hungarian notation debate.
It seems that the original intent of the notation was to encode type information that wasn't immediately obvious- not whether a variable is an int (iFoo), but what kind of int it is- such as a distance in centimeters (cmFoo). That way, if you see "cmFoo = mBar" you can tell that it's wrong because even though both are ints, one is meters and the other is centimeters, and thus the logic of the statement is incorrect, even though the syntax is fine. (Of course, I personally prefer to use classes such that that statement wouldn't even compile, or would do the conversion for you).
The problem was at some point people started using it without understanding it, and the world was cursed with lots of dwFoos and bBars.
Everyone needs to read that article- Making Wrong Code Look Wrong
How do I increase my code
understanding/reading skills?
Reading code is like dancing by yourself. You need a partner, and I suggest a debugger.
Stepping through code with a debugger is a true, lively dance. I recommend getting a quality, open-source project in the language of your choice, and then step through with the debugger. Concepts will come alive if you ask "why did that happen?", "what should happen next?".
One should ultimately be able to reason about code without a debugger; don't let it become a crutch. But that said, it is a very valuable tool.
Reading code is a lot like reading literature in that you need to have some insight into the author sometimes to understand what you're looking at and what to expect. They only way to improve your comprehension skills is by reading as much code and possible and trying to follow along.
I think a lot of what is mentioned here is applicable to coding...
Reading and understanding skills are a question of time. You will improve them as you get more experienced. Also depends in the quality of the code you are reading.
Keep in mind that sometimes it's not the best idea to learn directly from what you see at work. Books will teach you the best practices and you will be able to adapt them to yourself with the experience.
I am reading the Head First Design Patterns at present and it is very useful. It presents the information in a novel way that is easy to understand. Nice to see they have a C# version for download.
There will always be struggles with reading code unless you are Jon Skeet. This isn't to say that it is a big problem, rather that unless you can eat, sleep, breathe in that programming language, it will always take a little time to digest code. Looking at other people's code is certainly a good suggestion for helping in some ways but remember that there are many different coding conventions out there and some may be more enforced than others, e.g. interface names start with an I to give a simple example. So, I guess I'm saying that even with Visual Studio and Resharper, there is still a little work to understand a few lines of code since I can't quite write out sentences in C# yet.
1) Educate yourself. Read relevant literature.
2) Write code
3) Read code
4) Read relevant blogs.
Visit http://hanselminutes.com . He is a programmer from microsoft. Even though you don't program on microsoft stack, it's good to read through. There is a podcast in there that answers this question.
Another suggestion is to make sure you have the appropriate tools for the job before you start digging into a piece of code. Trying to understand a code-base without the ability to search across the entire set of files is extremely difficult.
Granted, we very rarely have the entire set of files, especially in large projects, but whatever boundaries you draw, you should have good visibility and searchability across those files. Whatever lies outside those boundaries can be considered 'black box' and perhaps lies outside the scope of your grokking.
There are many good open source editors including Eclipse and the CDT. Spending some time learning how to effectively create projects, search across projects, and enable any IDE-specific tooltips/helpers can make a world of difference.

Write C++ in a graphical scratch-like way?

I am considering the possibility of designing an application that would allow people to develop C++ code graphically. I was amazed when I discovered Scratch (see site and tutorial videos).
I believe most of C++ can be represented graphically, with the exceptions of preprocessor instructions and possibly function pointers.
What C++ features do you think could be (or not be) represented by graphical items?
What would be the pros and cons of such an application ? How much simpler would it be than "plain" C++?
RECAP and MORE:
Pros:
intuitive
simple for small applications
helps avoid typos
Cons:
may become unreadable for large (medium?) - sized applications
manual coding is faster for experienced programmers
C++ is too complicated a language for such an approach
Considering that we -at my work- already have quite a bit of existing C++ code, I am not looking for a completely new way of programming. I am considering an alternate way of programming that is fully compatible with legacy code. Some kind of "viral language" that people would use for new code and, hopefully, would eventually use to replace existing code as well (where it could be useful).
How do you feel towards this viral approach?
When it comes to manual vs graphical programming, I tend to agree with your answers. This is why, ideally, I'll find a way to let the user always choose between typing and graphical programming. A line-by-line parser (+partial interpreter) might be able to convert typed code into graphical design. It is possible. Let's all cross our fingers.
Are there caveats to providing both typing and graphical programming capabilities that I should think about and analyze carefully?
I have already worked on template classes (and more generally type-level C++) and their graphical representation.
See there for an example of graphical representation of template classes. Boxes represent classes or class templates. First top node is the class itself, the next ones (if any) are typedef instructions inside the class. Bottom nodes are template arguments. Edges, of course, connect classes to template arguments for instantiations.
I already have a prototype for working on such type-level diagrams.
If you feel this way of representing template classes is plain wrong, don't hesitate to say so and why!
Much as I like Scratch, it is still much quicker for an experienced programmer to write code using a text editor than it is to drag blocks around, This has been proved time and again with any number of graphical programming environments.
Writing code is the easiest part of a developers day. I don't think we need more help with that. Reading, understanding, maintaining, comparing, annotating, documenting, and validating is where - despite a gargantuan amount of tools and frameworks - we still are lacking.
To dissect your pros:
Intuitive and simple for small applications - replace that with "misleading". It makes it look simple, but it isn't: As long as it is simple, VB.NET is simpler. When it gets complicated, visual design would get in the way.
Help avoid typos - that's what a good style, consistency and last not least intellisense are for. The things you need anyway when things aren't simple anymore.
Wrong level
You are thinking on the wrong level: C++ statements are not reusable, robust components, they are more like a big bag of gears that need to be put together correctly. C++ with it's complexity and exceptions (to rules) isn't even particulary suited.
If you want to make things easy, you need reusable components at a much higher level. Even if you have these, plugging them together is not simple. Despite years of struggle, and many attempts in many environments, this sometimes works and often fails.
Viral - You are correct IMO about that requriement: allow incremental adoption. This is closely related to switching smoothly between source code and visual representation, which in turn probably means you must be able to generate the visual representation from modified source code.
IDE Support - here's where most language-centered approaches go astray. A modern IDE is more than just a text editor and a compiler. What about debugging your graph - with breakpoints, data inspection etc? Will profilers, leak detectors etc. highlight nodes in your graph? Will source control give me a Visual Diff of yesterday's graph vs. today's?
Maybe you are on to something, despite all my "no"s: a better way to visualize code, a way to put different filters on it so that I see just what I need to see.
The early versions of C++ were originally written so that they compiled to C, then the C was compiled as normal.
What it sounds like you are describing is a graphical language that is compiled to C++, which will then be compiled as normal.
So really you are not creating a graphical C++, you are creating a new language that happens to be graphical. Nothing wrong with that, but don't let C++ restrict what you do, because eventually you may want to compile the graphical language straight to machine code, or even to something like CIL, Java ByteCode, or whatever else tickles your fancy.
Other graphical languages you may want to check out are LabVIEW, and more generally the category of visual programming languages.
Good luck in your efforts.
The complexity of a nontrivial program is usually too high to be represented with graphical symbols, which are low in their information content. Unless your approach is markedly different in some way, I am skeptical that this would be of value based on past efforts.
So, practically speaking, his will be useful only for instructional purposes and very simple programs. But that would still be a great target market for a product like this. sometimes people have trouble grasping the fundamentals, and a visual model might be just the thing to help things click.
Interesting idea. I doubt I'd use it though. I tend to prefer coding in a flat text editor, not even an IDE, and for tough problems I prefer a pad of paper. Most of the really experienced programmers I know work this way, Maybe it's because we grew up in a different environment, but I think it's also because of the way we think about programming. As you get more experience, you start seeing the code in your head more clearly than any GUI tool will show it to you.
As for your question, I'd nominate templates as one of the harder / more interesting sort of thing to try to represent well. They are ubiquitous and carry information that you won't have access to as you are designing your tool. Getting that to the user in a useful way should pose an interesting challenge.
What C++ features do you think could be [...] represented by graphical items?
Object Oriented Design. Hence classes, inheritance, polymorphism, mutability, const-ness etc. And, templates.
What would be the pros and cons of such an application?
It may be easier for beginners to start writing programs. For the experienced, it may be get rid of the boring parts of programming.
Think of any other code generator. They create a framework for you to write the more involved portion(s). They also lead to bloated-code (think of any WYSIWYG HTML editor).
The biggest challenge, as I see it, is that any such UI necessarily hinders the user's imagination.
How much simpler would it be than "plain" c++ ?
It can be a real pain, when you wade through truckloads of errors which is typical of code generators.
Further, since a lot of code is generated, you have no idea of what is going on -- debugging becomes difficult.
Also, for the experienced there may be some irritation to find that the generated code is not per their preferred coding style.
I prefer hot-keys instead graphical menus and buttons.
And I think same thing will happen with graphical development tool. Many peoples will prefer manual codding.
But, source code visualizer - should be nice thing.
I like the idea, but I suspect there comes a point where things get far too complicated to be represented graphically.
However, given recent experience at work; it would be useful to give such a graphical interface to a non-techie person to use to create basic drag-and-drop programs, leaving myself free to get on with some "proper" programming ;-) If it can do the job of allowing somebody non-skilled to build something functional it can be a very good thing (even if programming logic escapes them)
There comes a point in such a system where it becomes easier to define what you want to do using literal C++ code, rather than have a user interface getting in the way; it can get frustrating to the sessioned programmer knowing the precise code that needs to be written but then only being limited to the design GUI. I'm specifically thinking about a more common application, such as html editors/designers in which they allow newbies to build their websites without knowing any html at all.
It would be interesting to see how such a system would handle the dynamic allocation of memory, and the different states of a program as time progressed; I suspect that there are some very basic programming concepts that may be difficult to represent graphically.. Polymorphism..? Virtual Classes, LinkList, Stacks/Circular Queues. I wonder for a moment how you would explain an image compression algorithm (such as jpg) successfully too without the help of a gigantic display screen.
I also wonder if such a system would even go to such a low level, and whether you would be dealing with abstracted concepts and the compiler would be working out the best way to do something.
I've been working on a new model-driven software development paradigm named ABSE (http://www.abse.info) that supports end-user programming: It's a template-based system that can be complemented with transformation code. I also have an IDE (named AtomWeaver) implementing ABSE that is in pre-alpha stage right now.
With AtomWeaver, as an expert/architect, you build your knowledge Templates, and then the developers (or end-users if you make your meta-models simpler) can just "assemble" systems by building blocks, and then filling template parameters in form-style editors.
At the end, pressing the "Generate" button will create the final system as specified by the architect/expert.
I'm surprised you think function pointers would be a particular problem. How about anything at all to do with pointers?
A programming language can be represented by a hierarchy of nodes - that's exactly what the compiler turns it into. It is very strange that the UI for editing programs is still a sequence of characters that get parsed, because the degrees of freedom in the editor is way larger than the available set of allowed choices. But intellisense helps to reduce this problem a lot.
C++ would be a strange choice to base such a system on.
I think the major problem of this kind of IDEs are that the code generated becomes unmantainable easily.
This happened to Delphi. It's a really nice tool to develop some kind of applications, however, when we start adding complex relationships between the components, start adding Design Patterns, etc. the code grows to an unmantainable size.
I believe it's also because graphical tools don't apply the concept of MVC (or if they do, it's only in the way that the IDE understands).
It can be really helpful for prototypes and very small applications that don't tend to grow, otherwise it can become a mess for the developer(s)

What project would you recommend me to get up to speed with C++ [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I know that C++ is a very complex language that takes many years of practice to master.
Taking that into account do you know of a small project (around a 1k of loc) that tests all of C++ major features (inheritance, pointers, memory management, etc).
The thing is I'm a Java/Python programmer and I really want to learn C++ so I've been studying C++ for a while but haven't tested anything of what I've learned beyond small exercises.
I want to take all of that knowledge and put into practice.
Doing this alone you will obtain many harmful habits. It's much better to get an internship with a company that has high competence in C++ development and train under guidance.
C++ is like a grenade without a safety pin – looks cool and you've heard that all "real professionals" use it, but you don't know when it is to explode. A tremendous amount of features that can be used for good or for evil without knowing whether it's really good or evil. That's why guidance is a must here.
A memory manager. That should get you thinking about:
free store management
pointers (of course!)
inheritance (you will want your driver code to use this)
templates (another way to pass the manager around -- driver #2)
designing user defined data structures (a memory block)
efficient use of standard container(s)
algorithms (to move around, figure out empty blocks, defragment)
Effective C++ and More Effective C++
Other than that, pick a (small?) personal project you want to write and do it in C++. You are not going to learn C++ by reading a 1000 line project.
I'm not sure about anything that tests all major features. There are a lot of them, and some are rarely used together (templates and virtual functions come to mind. Both achieve a form of polymorphism, so you often use one or the other depending on your needs.)
A suitable project in that it'd touch on all the important features might be something apparently simple like writing a correct container class, similar to std::vector or std::list. Ensure exception safety, iterator validity, the appropriate time complexity on all operations and every other requirement specified in the standard.
The problem with this, as well as most other projects, is that you won't really know when you're done. Making a resizable array might take 50 lines of code, and 20 minutes of your time. And then a beginner would think he's done. Making it exception-safe requires you to be able to spot all the places where the class might be thrown into an inconsistent state by an exception.
That's a kind of general problem with C++. It's easy enough to think you get it, and the compiler certainly won't notify you of aspects you've forgotten to handle. So you might think your code is perfect, and yet it'll crash for all sorts of odd special cases.
As sharptooth said, for a language as messy as C++, writing code on your own is risky. It is easy to fall into the trap of "I've written some code, it compiles and it seems to run. Therefore it is correct".
Of course you could post your code here or on other sites for review, or maybe just supplement your coding with reading the docs for actual high quality C++ code (most boost libraries tend to have comprehensive documentation, specifying both the rationale for various design decisions, and how it safely handles all the weird special cases that tend to crop up in C++. The C++ standard itself would be another excellent resource, of course. In either case, these might help you determine what problems to look out for)
When I was learning C++, I used it to write my own language for writing Colossal Cave style adventures. Like most computer languages it never saw the light of day, but it did teach me a lot about C++.
Whatever you choose the thing to avoid when learning C++ is GUI programming, which is a trap which will drain all your gumption and probably teach you bad C++ habits in the process.
I'd recommend creating a text based game. That really helped me firm up my C++. Doesn't take too long and you can exercise all the features you want. Come up with the game yourself. It is more fun that way.
Another great idea is to write a simple mathematics library, supporting Vectors Matrices etc.
But with todays libraries, that is only of academic use.
In order to learn C++ it is useful to look at a lot of well written C++ code.
I think the Qt library is quite nice for this so I suggest: Write an Qt application.
Look how they use C++ and create your own graphical components in a similar fashion.
Ideas:
- Stock chart viewer widget that connects to one of the financial websites and scrapes history data.
- Simple Excel like spreadsheet widget.
Depends on what area you want to work in. But nothing worth doing correctly comes in at less than 1000 lines of code.
If you are going to be writing games then try writing a Tetris clone.
If you think you will be using sockets etc then writing a simple chat/irc client would help.
Do you have a specific itch that needs to be scratched? When was the last time you thought "this sucks, I could do better?". Can you do better?
I would recomend writing a Tetris clone.
You can learn a lot of c++ concepts with this and learn a 2d library like SDL or maybe even OpenGL throgh SDL.
It is always good to have a project with visual results and at the end of it you can play it.
There seem to be two themes coming from the answers:
You need to pick a project that might involve more than 1K LOC in order to get the true experience of the project.
You need to also pair up with an experienced C++ developer, who can help you think through problems and avoid pitfalls associated with the language.
You can get around both of these by swing by sourceforge.net and signing up to help with an existing C++ project. As long as you don't mind your code being open source, you should be able to find an existing project to learn from plus experienced developers who can help by reviewing your code and offering guidance.
An interactive world:
A matrix where each position can be a Void or a Being.
A Being is something with a few attributes: age, Time left, gender, neigbor connections, etc. Capable a few interactions: fights, having sex and kids, friendships, etc. Some have special Skills, depending on their fathers (inherited trades)... like ability to kill, ability to make food,, etc...
Possible outcomes of those interactions and skills are changes on the self attributes, or creating offspring (when possible), or change neigbor attributes.
At each iteration, print the matrix as symbols/numbers on the console (depending on the attributes, etc), starting from a Biblical iteration 0 (initial conditions of your choice... you're God here).
Now you got some real-life pattern simulator, and learned something about inheritance, polimorfism, virtual functions, instantiation of classes, etc.
I would suggest a simple text editor would be a reasonable goal.
It's a problem domain that you have a good grasp of.
You have memory management issues, library class reuse issues (stl/curses?), pointer issues, and lots of options where derived classes can be used.
For polymorphism, perhaps, you can have the editor read from a keyboard, or suck commands in from a text file.
There is another good one.... dealing with files.
You don't have to cross platform it. You don't have to open source it. You don't have to show it to anyone. You don't even have to finish it. It can be an exercise just for you.
If you're learning fron a book, it must have plenty of well-thought-out exercises you can implement and learn from. Also check out the university sites and their C++ labs / assignments.

References Needed for Implementing an Interpreter in C/C++

I find myself attached to a project to integerate an interpreter into an existing application. The language to be interpreted is a derivative of Lisp, with application-specific builtins. Individual 'programs' will be run batch-style in the application.
I'm surprised that over the years I've written a couple of compilers, and several data-language translators/parsers, but I've never actually written an interpreter before. The prototype is pretty far along, implemented as a syntax tree walker, in C++. I can probably influence the architecture beyond the prototype, but not the implementation language (C++). So, constraints:
implementation will be in C++
parsing will probably be handled with a yacc/bison grammar (it is now)
suggestions of full VM/Interpreter ecologies like NekoVM and LLVM are probably not practical for this project. Self-contained is better, even if this sounds like NIH.
What I'm really looking for is reading material on the fundamentals of implementing interpreters. I did some browsing of SO, and another site known as Lambda the Ultimate, though they are more oriented toward programming language theory.
Some of the tidbits I've gathered so far:
Lisp in Small Pieces, by Christian Queinnec. The person recommending it said it "goes from the trivial interpreter to more advanced techniques and finishes presenting bytecode and 'Scheme to C' compilers."
NekoVM. As I've mentioned above, I doubt that we'd be allowed to incorporate an entire VM framework to support this project.
Structure and Interpretation of Computer Programs. Originally I suggested that this might be overkill, but having worked through a healthy chunk, I agree with #JBF. Very informative, and mind-expanding.
On Lisp by Paul Graham. I've read this, and while it is an informative introduction to Lisp principles, is not enough to jump-start constructing an interpreter.
Parrot Implementation. This seems like a fun read. Not sure it will provide me with the fundamentals.
Scheme from Scratch. Peter Michaux is attacking various implementations of Scheme, from a quick-and-dirty Scheme interpreter written in C (for use as a bootstrap in later projects) to compiled Scheme code. Very interesting so far.
Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages, recommended in the comment thread for Books On Creating Interpreted Languages. The book contains two chapters devoted to the practice of building interpreters, so I'm adding it to my reading queue.
New (and yet Old, i.e. 1979): Writing Interactive Compilers and Interpreters by P. J. Brown. This is long out of print, but is interesting in providing an outline of the various tasks associated with the implementation of a Basic interpreter. I've seen mixed reviews for this one but as it is cheap (I have it on order used for around $3.50) I'll give it a spin.
So how about it? Is there a good book that takes the neophyte by the hand and shows how to build an interpreter in C/C++ for a Lisp-like language? Do you have a preference for syntax-tree walkers or bytecode interpreters?
To answer #JBF:
the current prototype is an interpreter, and it makes sense to me as we're accepting a path to an arbitrary code file and executing it in our application environment. The builtins are used to affect our in-memory data representation.
it should not be hideously slow. The current tree walker seems acceptable.
The language is based on Lisp, but is not Lisp, so no standards compliance required.
As mentioned above, it's unlikely that we'll be allowed to add a full external VM/interpreter project to solve this problem.
To the other posters, I'll be checking out your citations as well. Thanks, all!
Short answer:
The fundamental reading list for a lisp interpreter is SICP. I would not at all call it overkill, if you feel you are overqualified for the first parts of the book jump to chapter 4 and start interpreting away (although I feel this would be a loss since chapters 1-3 really are that good!).
Add LISP in Small Pieces (LISP from now on), chapters 1-3. Especially chapter 3 if you need to implement any non-trivial control forms.
See this post by Jens Axel Søgaard on a minimal self-hosting Scheme: http://www.scheme.dk/blog/2006/12/self-evaluating-evaluator.html .
A slightly longer answer:
It is hard to give advice without knowing what you require from your interpreter.
does it really really need to be an interpreter, or do you actually need to be able to execute lisp code?
does it need to be fast?
does it need standards compliance? Common Lisp? R5RS? R6RS? Any SFRIs you need?
If you need anything more fancy than a simple syntax tree walker I would strongly recommend embedding a fast scheme subsystem. Gambit scheme comes to mind: http://dynamo.iro.umontreal.ca/~gambit/wiki/index.php/Main_Page .
If that is not an option chapter 5 in SICP and chapters 5-- in LISP target compilation for faster execution.
For faster interpretation I would take a look at the most recent JavaScript interpreters/compilers. There seem to be a lot of thought going into fast JavaScript execution, and you can probably learn from them. V8 cites two important papers: http://code.google.com/apis/v8/design.html and squirrelfish cites a couple: http://webkit.org/blog/189/announcing-squirrelfish/ .
There is also the canonical scheme papers: http://library.readscheme.org/page1.html for the RABBIT compiler.
If I engage in a bit of premature speculation, memory management might be the tough nut to crack. Nils M Holm has published a book "Scheme 9 from empty space" http://www.t3x.org/s9fes/ which includes a simple stop-the-world mark and sweep garbage collector. Source included.
John Rose (of newer JVM fame) has written a paper on integrating Scheme to C: http://library.readscheme.org/servlets/cite.ss?pattern=AcmDL-Ros-92 .
Yes on SICP.
I've done this task several times and here's what I'd do if I were you:
Design your memory model first. You'll want a GC system of some kind. It's WAAAAY easier to do this first than to bolt it on later.
Design your data structures. In my implementations, I've had a basic cons box with a number of base types: atom, string, number, list, bool, primitive-function.
Design your VM and be sure to keep the API clean. My last implementation had this as a top-level API (forgive the formatting - SO is pooching my preview)
ConsBoxFactory &GetConsBoxFactory() { return mConsFactory; }
AtomFactory &GetAtomFactory() { return mAtomFactory; }
Environment &GetEnvironment() { return mEnvironment; }
t_ConsBox *Read(iostream &stm);
t_ConsBox *Eval(t_ConsBox *box);
void Print(basic_ostream<char> &stm, t_ConsBox *box);
void RunProgram(char *program);
void RunProgram(iostream &stm);
RunProgram isn't needed - it's implemented in terms of Read, Eval, and Print. REPL is a common pattern for interpreters, especially LISP.
A ConsBoxFactory is available to make new cons boxes and to operate on them. An AtomFactory is used so that equivalent symbolic atoms map to exactly one object. An Environment is used to maintain the binding of symbols to cons boxes.
Most of your work should go into these three steps. Then you will find that your client code and support code starts to look very much like LISP too:
t_ConsBox *ConsBoxFactory::Cadr(t_ConsBox *list)
{
return Car(Cdr(list));
}
You can write the parser in yacc/lex, but why bother? Lisp is an incredibly simple grammar and scanner/recursive-descent parser pair for it is about two hours of work. The worst part is writing predicates to identify the tokens (ie, IsString, IsNumber, IsQuotedExpr, etc) and then writing routines to convert the tokens into cons boxes.
Make it easy to write glue into and out of C code and make it easy to debug issues when things go wrong.
The Kamin Interpreters from Samuel Kamin's book Programming Languages, An Interpreter-Based Approach, translated to C++ by Timothy Budd. I'm not sure how useful the bare source code will be, as it was meant to go with the book, but it's a fine book that covers the basics of implementing Lisp in a lower-level language, including garbage collection, etc. (That's not the focus of the book, which is programming languages in general, but it is covered.)
Lisp in Small Pieces goes into more depth, but that's both good and bad for your case. There's a lot of material on compiling and such that won't be relevant to you, and its simpler interpreters are in Scheme, not C++.
SICP is good, definitely. Not overkill, but of course writing interpreters is only a small fraction of the book.
The JScheme suggestion is a good one, too (and it incorporates some code by me), but won't help you with things like GC.
I might flesh this out with more suggestions later.
Edit: A few people have said they learned from my awklisp. This is admittedly kind of a weird suggestion, but it's very small, readable, actually usable, and unlike other tiny-yet-readable toy Lisps it implements its own garbage collector and data representation instead of relying on an underlying high-level implementation language to provide them.
Check out JScheme from Peter Norvig. I found this amazingly simple to understand and port to C++. Uh, dunno about using scheme as a scripting language though - teaching it to jnrs is cumbersome and feels dated (helloooo 1980's).
I would like to extend my recommendation for Programming Languages: Application and Interpretation. If you want to write an interpreter, that book takes you there in a very short path. If you read through writing the code you read and doing the exercise you end up with a bunch of similar interpreters but different (one is eager, the other is lazy, one is dynamic, the other has some typing, one has dynamic scope, the other has lexical scope, etc).