Related
Is there any reason why the expression
(foo5 (foo4 (foo3 (foo2 (foo1 arg)))))
cannot be replaced with
(foo5 (foo4 (foo3 (foo2 (foo1 arg)-)
or the like, and then expanded back?
I know lack of reader macros means that you cannot change syntax, but can this expansion possibly be hard coded into the java?
I do this when I hand write code.
Yes, you could do this, even without reader macros (in fact, you can change Clojures syntax with a bit of hacking).
But, the question is, what would it gain you? Would it always expand to top-level? But then cutting and pasting code would fail, if you moved it to or from top level. And, of course, all the various tools that operate of clojure syntax would need to understand it.
Ultimately if you really dislike all the close parens why not use
(-> arg foo1 foo2 foo3 foo4)
instead?
Yes, this could be done, but I'm not sure it is the right solution and there are a number of negatives which will likely outweigh the benefits.
Suggestions like this are often the result of poor coding tools and a 'traditional' conceptual model for writing code. Selecting the right tools and looking at your code from a slightly different perspective will usually eliminate the cause which lead to this type of suggestion.
Most of the non-functional, non-lispy style languages are based around a token and line model of code. You tend to think of the code in terms of lines of tokens and you tend to edit the code on this basis. There is typically less nesting of expressions and lines are usually terminated with some marker, such as a semi-colan. Likewise, tools such as your editor, have features which have evolved to support token and line based editing. They are good at it.
The lisp style languages are less focused on lines of tokens. The emphasis here is on list forms. lines of tokens are replaced with nested lists of symbols - the line is less relevant and you typically have a lot more nesting of forms. This change means your standard line oriented tools, like your editor, are less suitable. The typical mental model of the code as lines of tokens is also less useful.
With languages like Clojure, your better off thinking in terms of list forms and not lines of code. Once you make this transition, you then start looking for tools which also model the code along these lines. For example, you either look for editors specifically designed to work with lists of data rather than lines of data or you look for editors which have extensions which will allow you to work with lists.
Once your editor understands that lists are the fundamental grouping unit, not lines, things like parenthesis become largely irrelevant from a code writing/editing perspective. You don't worry about closing parenthesis, counting parenthesis nesting levels etc. This all gets managed by the editor automatically. You don't move by lines, you move by lists, you don't kill/delete a line, you kill a list, you don't cut and copy a block of lines, you cut and copy a list of lists etc.
The good news is that in many respects, the structure of these list based code representations are actually easier to manipulate than most of the line based languages. This is primarily because there is less ambiguity or complexity. There are fewer exceptions to the rules and the rules are inherently simple. As a consequence, many editors designed for programmers will have support for this style of coding as well as advanced features which are difficult to implement in less structured code.
My suspicion is that your suggestion to have an additional bit of syntactic sugar to avoid having to type multiple closing parenthesis is actually a symptom of not having the right tools to write your code. Once you do, you will almost never need to enter a closing parenthesis or count opening parens to ensure you get the nesting right. This will be handled by the editor. Your biggest challenge will be in shifting your mental model to think in terms of lists and lists of lists. The parens will become largely invisible and you will jump around in your code according to list units rather than line units. The change is not easy and it can take some time to re-train your brain and fingers, but once you do, you will likely be surprised at how quickly you begin to edit and manipulate your code.
If your an emacs user, I highly recommend extensions such as paredit and lispy. If your using some other editor, look for paredit type extensions. However, as these are extensions, you must also spend some time training yourself to use whatever the key bindings are that the extension uses - there is no point having an extension with great code navigaiton based on lists if you still just arrow around with the arrow keys (unless it is emacs and you have re-bound those arrow keys to use the paredit navigation bindings).
I'm currently experimenting with a programming language. I defined the basic syntax and wrote a pretty simple parser some months ago. Today I wanted to continue the project, but after a short time there was something about the syntax that bothered me.
The end of statement
When I started the Project I thought using linebreaks as end of statement would be nice. Just like that:
public fnc addPerson: Person personInstance
{
[this.collection.add: personInstance]
return this
}
Now today I think it would look and fell much better using semicolons which also would allow putting the entire thing in one line.
public fnc addPerson: Person personInstance
{
[this.collection.add: personInstance]; return this;
}
I really wonder what are from an objective ( not technical) perspective the pros and cons of those?
I mean using linebreaks will force the the developer (at least a bit) to write clean code. But it makes the thing pretty inflexible.
In what kind of problems you will probably run into (as a user of the language) using the linebreak end of statement?
What language feature limitations will I have to accept using linebreaks or semicolons?
We had all this before, with assembler, RPG, Cobol, and various other tabular languages where line terminators were significant. Harder to write compilers for. We don't need to go back there.
When this was first done back then, everybody realized the need for a statement continuation indicator, so you could break statements over multiple lines. Now that Scala et al. have reintroduced this they've forgotten that part of it, so it becomes impossible to present long statements in an acceptable format.
Not a good idea. Whitespace is whitespace, not syntax.
I would expect, for most people, line breaks are easier to read
Using line breaks would introduce readability problems for very long lines
I don't see how one would impose any limitations over the other
The big problem with using newline as statement separator is not that you can't write multiple statements on one line (you shouldn't do that anyway, it makes it too easy to miss an important part of code).
The problem is that it makes it hard to write one long statement over several lines. For a language where this becomes a problem, have a look at JavaScript and its automatic semicolon insertion.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I would like to write a simple in-house program that parses user commands written in a language of our team's own invention (but based closely on another program we are already familiar with). The command parser that I am working on now will simply be the UI through which the user can run the other algorithms I have already written. (Those other algorithms, by the way, are used to generate the input files for a molecular dynamic simulation package called LAMMPS.) The only thing I really have left to do is just write this UI, but as it turns out, writing your own scripting language is almost an intractable challenge for a non software engineer to tackle on his own.
According to the answers I received, what I am try to make would be considered a Domain Specific Language, and it is not advisable to try to make one's own DSL due to the enormous amount of work required to make it useful and bug-free.
The best option then would actually be to use an existing scripting language like Lua or Python, and embed it in the program.
To do this, I will most likely use Lua because it seems most fitting for our needs. So at this point, the rest of this question is no longer relevant since the answer would be: "Don't do it yourself." But I'm still going to keep part of it here for other users to be able read and learn from the wonderful answers below.
Thanks again to everyone who replied!
Old Question:
I would like to write a program that parses a user text input and then
runs a function corresponding to that input. To do this I would need
to parse the string for relevant keywords. I believe there will be
less than 15 keywords when I'm done, so ideally I'd like this code
to be simple and short.
The problem is that I am currently using if-statements to parse the
strings. This is an extremely inconvenient way to parse commands
because even for a short 3 word commands the code explodes into nested-ifs
3 layers deep. So longer 8+ word sentences will become nested-ifs more than
8 layers deep.
This kind of programing approach quickly becomes unmanageable, especially
when I need to make any significant changes to a command.
My question is whether or not there exists a data structure in C++ that
can help me better manage my giant nested-ifs, or if anyone could suggest
a better way to parse a string for lots of different data types (i.e.
substings, ints, and floats) and output an error message when the expected
type is not found?
Here is an example of a short user session to show the kinds of commands
I would like to interpret:
load "Basis.Silicon" as material 1
add material 1 to layer 1
rotate layer 1 about x-axis by 45 degrees
translate layer 1 in x-axis by 10 nm
generate crystal
These commands are based on an already-existing program that our team
uses, but unfortunately the source code for this program has never been
publicly released so I am left guessing as to how it was actually
implemented.
One final note, unlike natural language processors, I know exactly what
the format of each line will be. So my issue isn't so much how to interpret
the text, but rather how to code the logic in a concise and manageable way.
Thanks everyone!
Your question is not clear. And your goals are more difficult than what you believe.
Either you consider that you want to somehow process human language sentences (e.g. in English). Then you want to study natural language processing, and you can find some libraries related to that field.
Or you consider that you want to interpret some formal programming or scripting language. Then you want to study interpreters and compilers. BTW, in that case, you might just embed an existing interpreter (like Lua, Guile, Python, etc....) in your program.
You could also think in terms of expert systems with a knowledge base made of rules (this approach could be viewed as in the middle between NLP and scripting language) You'll then need some inference engine (perhaps CLIPS). See also J.Pitrat's blog.
Notice that even coding a simple interpreter is more difficult than you believe. You absolutely need to represent abstract syntax trees, which you construct from textual input with a parsing phase.
BTW, All of NLP, expert systems, and interpreter design and implementation are difficult fields. You could get a PhD in all 3 fields (but you have to choose which).
If you go the embedded interpreter way: study the interpreters I mentioned (Guile, Lua, Python, Neko, etc...) and choose which one you want, to embed.
If for whatever reason, you want to make an interpreter from scratch: Learn several programming languages first (including scripting languages like Ruby, Python, Ocaml, Scheme, Lua, Neko, ...). Read books on Programming Language Pragmatics (by M.Scott) and Lisp In Small Pieces (by Queinnec). Read also text books on compilation and parsing, and on Garbage Collection and formal (e.g. denotational) semantics. All this may need a dozen years of work.
Notice that by experience embedding a software in an interpreter is a very structuring design. If you did not thought of that at the beginning you probably need to redesign and refactor a lot your existing application. For instance, when embedding a software in an interpreter, you cannot afford that bad input crashes the program. So error handling and memory management (interfacing to the GC of the interpreter) is challenging and gives new constraints. Hence you'll need to re-think your application.
If all this is new (and even if you don't choose e.g. Guile as the embedding interpreter): learn and practice a bit of Scheme -e.g. with Guile or PltScheme- (e.g. reading SICP), read a little bit about λ-calculus and closures, then read Queinnec's Lisp In Small Pieces book. Remember the halting problem (which is partly why interpreters are difficult to code).
BTW the syntax you are proposing (e.g. rotate mat 1 by x 90) is not very readable and looks COBOL-like. If possible, have a language which looks familiar to existing ones. Make it easy to read !
Start by reading all the wikipages I am referencing here.
FWIW, I am the main author of MELT, a domain specific language (inspired a lot by Scheme) to extend the GCC compiler. Some of the papers / documentations I wrote might inspire you (and contain valuable references).
Addenda (after question was reformulated)
You seems to invent some formal syntax like
add material 1 to layer 1
rotate layer 1 about x-axis by 90 degrees
translate layer 1 in x-axis by 10 inches
I can't guess what kind of language is it? Are you implementing a 3D printer? If yes, you should stick to some existing standard formal language in that domain.
I believe that such a COBOL-like syntax is really wrong. The point is that it is too verbose, and that you are wishing to implement some domain specific language. I find your example very bad-looking.
Is that syntax your invention, or is there some document specifying (and many thousands already existing lines coded in) your domain specific language. If you are just inventing it, please reconsider the syntax and the semantics.
First, you need to specify on paper the full syntax and semantics of your DSL.
Is your DSL Turing complete? (I guess that yes, because Turing completeness is reached very quickly - e.g. with variables and loops....). If yes, you are inventing a scripting language. Please don't invent scripting language without knowing several programming & scripting languages (then read Programming Language Pragmatics...). The point is that, if your scripting language will become successful, advanced users will soon or later write important programs in it (e.g. many thousand lines). Then, these advanced users will be programmers. In that case, it is very important (for social & economic reasons) to have a DSL well founded and looking familiar (if possible, an extension of some existing scripting language).
If your DSL already exists, stick to its specification on paper. If that specification is not good enough, improve it with formalization (e.g. by writing some BNF syntax, and some formal (e.g. denotational) semantics for it). Publish and discuss that formalization with existing users.
Several industries got some ad-hoc DSLs which became widely used but was ill designed
(e.g., in the French nuclear industry, the Gibiane DSL designed in the 1970s by nuclear physicists, not computer scientists; the US Boeing corporation is also rumored to have made similar mistakes). Then, maintaining and improving the many hundred thousands lines of DSL scripts is becoming a nightmare (and may means losing millions of dollars or euros). So you better stick to some existing scripting language. The advantages are that there exist some culture on it (e.g. you can find dozens of books on Python or Lua, and many trained engineers familiar with them), that the interpreter is widely used and tested, that the community working on them is improving the interpreters, so it has quite few uncorrected bugs.
You should not attempt to design and implement your own DSL if you are not a trained computer scientist. Stick to some existing scripting language (of course their syntax is not like you want it to be), and leverage on existing implementations and experiment.
As a counter-example, J.Ousterhout has invented the widely used Tcl scripting language, with the claim that scripts are always small (e.g. hundreds of line only) and won't grow to big code base; unfortunately, some of them did, and Tcl is known as a bad language to code many dozens of thousands of lines (even if Tcl is an easy and convenient language for tiny scripts). The moral of the story is that if a (turing complete) scripting language is becoming successful, some "crazy" advanced user will code hundred of thousands of script code. So you need that scripting language to be well designed from the start. Hence, you should adopt and adapt a good existing scripting language (and avoid inventing an unfamiliar syntax without having a good knowledge of several existing scripting languages)
later additions
PS: my criticism of Tcl is not entirely subjective: the point is that Tcl was designed for small scripts in mind (read J.Ousterhout's first papers about Tcl), but my point is that when you offer a Turing-complete scripting language, some "crazy" user will eventually write huge scripts for it. Hence, you need to anticipate such "crazy" usage by offering a scripting language which "scales up" to big scripts, so is built according to software engineering practices for large software code base.
NB. Lua is probably a good choice as a language to embed. It is small, has a nice implementation, is well documented, and has good performance. But be careful about memory management issues (and this advice holds for any scripting language).
EDIT: To be more clear, I would like to have a short list of key words
(<15). The order/presence of which would determine which function will
be run.
You can build a small ruleset engine (e.g. something that processes lists of words). You write that engine/function once and just pass the data structures to it.
As an alternative, a solution using regular expressions would be probably the fastest to code (the engine is ready for you), assuming you're familiar with the regexp syntax (if not, it's still a good investment).
You could build a table of keywords and function pointers:
typedef void (*Function_Pointer)(void);
struct table_entry
{
const char * keyword;
Function_Pointer p_function;
};
table_entry function_table[] =
{
{"car", Process_Car},
{"bike", Process_Bike},
};
Search the table for a keyword. If the keyword is found, dereference the function pointer.
The following snippet will execute the function for processing the word "car":
(function_table[0].p_function)();
There is a famous program, called Eliza, which parses sentences for keywords.
Examples can be found at: Eliza C++ examples
Many years ago when I didn't know much about object oriented design I heard one guy said something like "How can you write a text editor without polymorphism?" I didn't know much about OOP and so I couldn't judge how wise that though was or ask any specific questions at that time.
Now, after many years of software development (mostly C++), I've used polymorphism many times to solve various problems when designing software. Yet I've never created text editors. So I still can't evaluate that guy's idea.
Is using polymorphism so essential for implementing a text editor in object-oriented languages and why?
Polymorphism for writing a text editor is by no means essential. In fact, polymorphism for solving any programming problem is not essential. It's just one way to do it. Sometimes it makes solving certain kinds of problems easier, and sometimes it just gets in the way.
The evidence for this is that there are perfectly usable text editors developed long before "OOP" became popular.
I would say "no", because it's entirely possible to write perfectly good text editors in non-object-oriented languages, so it can't be that essential.
Polymorphism is a great technique for the problems it addresses, but it's by no means the golden hammer for everything that ails a software developer.
This is a term that was thrown around a lot when OO programming was the rage. This guy was probably trying to intimidate you with large words, I doubt if he fully understood what he was saying although it is a simple concept when explained.
Any here lies the crux of the argument - how many times would you have to write, maintain or extend a text editor - none - therefore imho an OO paradigm is of little use in a for what is a relatively simple piece of code that needs to be highly efficient.
Many of the design patterns like Memento, Flyweight etc that may be used to design/implement Text Editor require inheritance and polymorphism.
The other points about polymorphism as being just a tool are spot on.
However if "the guy" did have some experience with writing text editors he may well have been talking about using polymorphism in the implementation of a document composition hierarchy.
Basically this is just a tree of objects that represent the structure of your document including details such as formatting (bold, italic etc) coloring and so on.
(Most web browsers implement something similar in the form of the browser Document Object Model (DOM), although there is certainly no requirement that they use polymorphism.)
Each of these objects inherits from a common base class (often abstract) that defines a method such as Compose().
Then when it is time to display or to update the structure of the document, the code simply traverses the tree calling the concrete Compose() on each object. Each object is then repsonsible for composing and rendering itself at the appropriate location in the document.
This is a classic use of polymorphism because it allows new document "components" to be added (or changed) without any (or minimal) change to the main application code.
Once again though, there are many ways to build a text manipulation program, polymorphism is definitely not required to build one.
I once wrote a text editor in Basic. It wasn't a sophisticated text editor by any means, it's big highlight being a text-mode windowing thing used for some menus and dialogs, but it still did it's job at the time - ie it proved I could write a text editor in Basic. I even used it sometimes. I won't be showing the source in public - it's just too embarrassing!
When your text editor is mostly just inserting/deleting characters in a big array of strings and displaying them, little or no abstraction is needed other than the usual provided-as-standard abstractions of arrays and strings.
On the other hand, the amount of text that a text editor on a PC is expected to cope with has increased a lot over the last 20 years, sometimes to the extent that even a modern PC with multiple Gigabytes may not be able to keep the whole file in RAM. On top of that there are character set and encoding issues. A good text editor is expected to remember a (potentially large) number of bookmarks into multiple files, and to maintain them so they refer to the same point despite edits. And then there's syntax highlighting, the ability to record/playback macros, and more.
In short, modern text editors are much more complex than the things used in DOS and on other micros twenty years ago. That complexity is no doubt much easier to manage with a good toolkit for handling abstractions.
While a simple text editor (below edit.com from MS-DOS) may be realized easier in a static class only (because the functionality is very limited), as soon as you get to menus and dialogs, you will find yourself in dire need of object oriented language features.
Personally, I frown upon procedural code anyway - I prefer a mixture of OOP (program structure, separation of functionality, etc...) and functional programming (implementation).
This may sound like a religious argument of some sort, but I find my personal style quite recommendable. Usually I need far less lines of code (which are much easier to understand) than most of the developers I work with and my code feels much more "agile" and "flexible".
Try it. :-)
Oh - and polymorphy is not hard to understand. Simply imagine that you (as a person) can be handled as:
a) Man or woman
b) European, asian, american, african, oceanian (I hope this is right), etc...
c) By your name
d) By your occupation
But still you are a person - and a living being, and a part of the universe... and YOU.
So for someone who asks you for statistical reasons a few questions, you may be handled as a, say, woman from oceania (I don't know where you come from, but lets just assume) who is, hm, 42 years old and lived in, hm, Switzerland for 23 years (hahaha).
For your employer, you may be competent in programming and talking to your collegues.
However, HOW you fill those roles is dependent on your implementation. This is you.
Is using polymorphism so essential for implementing a text editor in object-oriented languages and why?
Depends on what kind of text editor you're talking about.
You can write notepad without OOP. But you most likely will need OOP for something like MS Word or OpenOffice.
Design Patterns: Elements of Reusable Object-Oriented Software uses text editor for examples (i.e. "case study") of Design Pattern application. You may want to check out the book.
While editing this and that in Vim, I often find that its syntax highlighting (for some filetypes) has some defects. I can't remember any examples at the moment, but someone surely will. Usually, it consists of strings badly highlighted in some cases, some things with arithmetic and boolean operators and a few other small things as well.
Now, vim uses regexes for that kinda stuff (its own flavour).
However, I've started to come across editors which, at first glance, have syntax highlighting better taken care of. I've always thought that regexes are the way to go for that kind of stuff.
So I'm wondering, do those editors just have better written regexes, or do they take care of that in some other way ? What ? How is syntax highlighting taken care of when you want it to be "stable" ?
And in your opinion what is the editor that has taken care it the best (in your editor of choice), and how did he do it (language-wise) ?
Edit-1: For example, editors like Emacs, Notepad2, Notepad++, Visual Studio - do you perchance know what mechanism they use for syn. high. ?
The thought that immediately comes to mind for what you'd want to use instead of regexes for syntax highlighting is parsing. Regexes have a lot of advantages, but as we see with vim's highlighting, there are limits. (If you look for threads about using regexes to analyze XML, you'll find extensive material on why regexes can't do what parsers do.)
Since what we want from syntax highlighting is for it to follow the syntactic structure of the language, which regexes can only approximate, you need to perform some level of real parsing to go beyond what regexes can do. A simple recursive descent lexer will probably do great for most languages, I'm thinking.
Some programming languages have a formal definition/specification written in Backus-Naur Form. All*) programming languages can be described in it. All you then need, is some kind of parser for the notation.
*) not verified
For instance, C's BNF definition is "only five pages long".
If you want accurate highlighting one needs real programming not regular expressions. RegExs are rarely the answer fir anything but trivial tasks. To do highlighting in a better way you need to write a simple parser. Parses basically have separate components that each can do something like identify and consume a quoted string or number literal. If said component when looking at it's given cursor can't consume what's underneath it does nothing. From that you can easily parse or highlight fairly simply and easily.
Given something like
static int field = 123;
• The first macher would skip the whitespace before "static". The keyword, literal etc matchers would do nothing because handling whitespace is not their thing.
• The keyword matched when positioned over "static" would consume that. Because "s" is not a digit the literal matched does nothing. The whitespace skipper does nothing as well because "s" is not a whitespace character.
Naturally your loop continues to advance the cursor over the input string until the end is reached. The ordering of your matchers is of course important.
This approach is both flexible in that it handles syntactically incorrect fragments and is also easy to extend and reuse individual matchers to support highlighting of other languages...
I suggest the use of REs for syntax highlighting. If it's not working properly, then your RE isn't powerful or complicated enough :-) This is one of those areas where REs shine.
But given that you couldn't supply any examples of failure (so we can tell you what the problem is) or the names of the editors that do it better (so we can tell you how they do it), there's not a lot more we'll be able to give you in an answer.
I've never had any trouble with Vim with the mainstream languages and I've never had a need to use weird esoteric languages, so it suits my purposes fine.