I'm currently experimenting with a programming language. I defined the basic syntax and wrote a pretty simple parser some months ago. Today I wanted to continue the project, but after a short time there was something about the syntax that bothered me.
The end of statement
When I started the Project I thought using linebreaks as end of statement would be nice. Just like that:
public fnc addPerson: Person personInstance
{
[this.collection.add: personInstance]
return this
}
Now today I think it would look and fell much better using semicolons which also would allow putting the entire thing in one line.
public fnc addPerson: Person personInstance
{
[this.collection.add: personInstance]; return this;
}
I really wonder what are from an objective ( not technical) perspective the pros and cons of those?
I mean using linebreaks will force the the developer (at least a bit) to write clean code. But it makes the thing pretty inflexible.
In what kind of problems you will probably run into (as a user of the language) using the linebreak end of statement?
What language feature limitations will I have to accept using linebreaks or semicolons?
We had all this before, with assembler, RPG, Cobol, and various other tabular languages where line terminators were significant. Harder to write compilers for. We don't need to go back there.
When this was first done back then, everybody realized the need for a statement continuation indicator, so you could break statements over multiple lines. Now that Scala et al. have reintroduced this they've forgotten that part of it, so it becomes impossible to present long statements in an acceptable format.
Not a good idea. Whitespace is whitespace, not syntax.
I would expect, for most people, line breaks are easier to read
Using line breaks would introduce readability problems for very long lines
I don't see how one would impose any limitations over the other
The big problem with using newline as statement separator is not that you can't write multiple statements on one line (you shouldn't do that anyway, it makes it too easy to miss an important part of code).
The problem is that it makes it hard to write one long statement over several lines. For a language where this becomes a problem, have a look at JavaScript and its automatic semicolon insertion.
Related
Is there any reason why the expression
(foo5 (foo4 (foo3 (foo2 (foo1 arg)))))
cannot be replaced with
(foo5 (foo4 (foo3 (foo2 (foo1 arg)-)
or the like, and then expanded back?
I know lack of reader macros means that you cannot change syntax, but can this expansion possibly be hard coded into the java?
I do this when I hand write code.
Yes, you could do this, even without reader macros (in fact, you can change Clojures syntax with a bit of hacking).
But, the question is, what would it gain you? Would it always expand to top-level? But then cutting and pasting code would fail, if you moved it to or from top level. And, of course, all the various tools that operate of clojure syntax would need to understand it.
Ultimately if you really dislike all the close parens why not use
(-> arg foo1 foo2 foo3 foo4)
instead?
Yes, this could be done, but I'm not sure it is the right solution and there are a number of negatives which will likely outweigh the benefits.
Suggestions like this are often the result of poor coding tools and a 'traditional' conceptual model for writing code. Selecting the right tools and looking at your code from a slightly different perspective will usually eliminate the cause which lead to this type of suggestion.
Most of the non-functional, non-lispy style languages are based around a token and line model of code. You tend to think of the code in terms of lines of tokens and you tend to edit the code on this basis. There is typically less nesting of expressions and lines are usually terminated with some marker, such as a semi-colan. Likewise, tools such as your editor, have features which have evolved to support token and line based editing. They are good at it.
The lisp style languages are less focused on lines of tokens. The emphasis here is on list forms. lines of tokens are replaced with nested lists of symbols - the line is less relevant and you typically have a lot more nesting of forms. This change means your standard line oriented tools, like your editor, are less suitable. The typical mental model of the code as lines of tokens is also less useful.
With languages like Clojure, your better off thinking in terms of list forms and not lines of code. Once you make this transition, you then start looking for tools which also model the code along these lines. For example, you either look for editors specifically designed to work with lists of data rather than lines of data or you look for editors which have extensions which will allow you to work with lists.
Once your editor understands that lists are the fundamental grouping unit, not lines, things like parenthesis become largely irrelevant from a code writing/editing perspective. You don't worry about closing parenthesis, counting parenthesis nesting levels etc. This all gets managed by the editor automatically. You don't move by lines, you move by lists, you don't kill/delete a line, you kill a list, you don't cut and copy a block of lines, you cut and copy a list of lists etc.
The good news is that in many respects, the structure of these list based code representations are actually easier to manipulate than most of the line based languages. This is primarily because there is less ambiguity or complexity. There are fewer exceptions to the rules and the rules are inherently simple. As a consequence, many editors designed for programmers will have support for this style of coding as well as advanced features which are difficult to implement in less structured code.
My suspicion is that your suggestion to have an additional bit of syntactic sugar to avoid having to type multiple closing parenthesis is actually a symptom of not having the right tools to write your code. Once you do, you will almost never need to enter a closing parenthesis or count opening parens to ensure you get the nesting right. This will be handled by the editor. Your biggest challenge will be in shifting your mental model to think in terms of lists and lists of lists. The parens will become largely invisible and you will jump around in your code according to list units rather than line units. The change is not easy and it can take some time to re-train your brain and fingers, but once you do, you will likely be surprised at how quickly you begin to edit and manipulate your code.
If your an emacs user, I highly recommend extensions such as paredit and lispy. If your using some other editor, look for paredit type extensions. However, as these are extensions, you must also spend some time training yourself to use whatever the key bindings are that the extension uses - there is no point having an extension with great code navigaiton based on lists if you still just arrow around with the arrow keys (unless it is emacs and you have re-bound those arrow keys to use the paredit navigation bindings).
Indentation and pretty printing are used to improve the clarity and readability of a program. Both these styles use spaces. And how can I distinguish between them.
Pretty Printing is a method used to make your code easily readable and understandable. In Wikipedia, Pretty Printing is explained as follows
Prettyprint (or pretty-print) is the application of any of various
stylistic formatting conventions to text files, such as source code,
markup, and similar kinds of content. These formatting conventions can
adjust positioning and spacing (indent style), add color and contrast
(syntax highlighting), adjust size, and make similar modifications
intended to make the content easier for people to view, read, and
understand. Prettyprinters for programming language source code are
sometimes called code beautifiers or syntax highlighters.
Now lets see what Indentation is
In the written form of many languages, an indentation is an empty
space at the beginning of a line to signal the start of a new
paragraph. Many computer languages have adopted this technique to
designate "paragraphs" or other logical blocks in the program.
In computer programming languages, indentation is used to format
program source code to improve readability. Indentation is generally
only of use to programmers; compilers and interpreters rarely care how
much whitespace is present in between programming statements.
From these, one can understand that indentation is a way of Implementing Pretty Printing.
What is the best way to look for the RPAREN in a code?
For instance, I have this pseudo-code:
if(a && (b || "c)"))
| ^---------^| CASE A
^----------------^ CASE B
For instance, if I consider the first LPAREN, it need match with the last RPAREN (Case B). If I consider the second LPAREN it need match with the last-1 RPAREN (Case A).
Note that there are the string "C)" that have a RPAREN, but it need be ignored for the case.
Well... I think about regex, but I guess that it will be very complex (note that need match strings, regex, and another thinks then can include RPAREN or something like that). Then I think about use a manual scan (via code) to detect each part (like a manual regex).
I need that to parse a code that I'm building (own programmation language). And I want to ignore to read some codes to make it faster.
For instance:
function a() { return 1; }
function b() { return 2; }
alert(b());
On this case, only b() need be parsed, because a() never is used. So I will scan by starter { and ignore (but store) until the real }. If the function is used, it'll be parsed.
My doubts:
Regex or manual code?
It's a good or a bad thing to do? Do ignore the code if it never is used will help to improve de speed of parser?
Off-topic: some tips to faster the parser? Maybe a "pre-parsed" file that store the language code with computer code (opcode???)?
Regex cannot match parens - it isn't possible.
One way to parse a language like this is lex (tokenize) and yacc (parser) - you can find lots of information on the net.
Adding optimizations to the parser is unlikely to make it parse faster but can improve the performance of the resulting code. Good and bad are moral judgements, I don't know what they mean here.
Matching patterns in the source and substituting precompiled, optimized code can give you an improved result, but whether it speeds up parsing depends on how often the patterns appear in the code.
If you are building your own language, you should really learn about the standard methods of procesing language source code. (You're welcome to propose clever new ideas, but most such ideas turn out not to be so clever and if you know the standard ones it is often pretty obvious as to why).
You really can't process your code and "match" parentheses with a pure regular expression. You need some kind of push down automata or counting engine to match nested parantheses (or anything else that might match, e.g., braces, IF and ENDIF, ...) often called a "parser" in the context of such tasks.
Regarding your 3 questions:
1) Regex or manual code?
Learn about/use parser generators instead, for instance ANTLR.
2) It's a good or a bad thing to do? Do ignore the code if it never is used will help to improve de speed of parser?
This is really a "premature" optimization. Its better to simply get a fast parsing engine. ANTLR is pretty good; I doubt if you would notice the difference. If you insist on blazing fast, consider LRSTAR instead; the guy who built this spent the last decade micro-optimizing its generated parsers, and they're extremely fast.
And given that you are trying to implement a programming langauge, I'd suggest you worry about the much bigger issues of actually defining it crisply, building a working parser, and struggling with executing it in a practical way (whether that means interpreting or compiling doesn't matter). Given your apparant level of understanding the parsing business, I suspect you really aren't ready to do this. You'd be better off spending some time learning how compilers work in general so you have a reference point.
3) Off-topic: some tips to faster the parser? Maybe a "pre-parsed" file that store the language code with computer code (opcode???)?
You can speed up the parser by preprocessing the text and storing it as a set of tokens. You can also speed it up by storing the result of the previous parse under the assumption it hasn't changed.(Most source files in big systems of code don't change even though they may get recompiled a lot). You can even store the compiled code in some representation along with the source text to avoid compiling it. [I've considered storing the compiled code for individual functions like this; even when a file is edited, most of the functions in don't change]. These tricks all have troubles: how do you get the programmer and the editors to cooperate by setting all this up? Its a lot easier to just build a fast parser, and you should start there and worry about the fancy tricks later.
I have worked with java for a while now, and I found checkstyle to be very useful. I am starting to work with c++ and I was wondering if there is a style checker with similar functionality. I am mainly looking for the ability to write customized checks.
What about Vera++ ?
Vera++ is a programmable tool for verification, analysis and transformation of C++ source code.
Vera++ is mainly an engine that parses C++ source files and presents the result of this parsing to scripts in the form of various collections - the scripts are actually performing the requested tasks.
Click here to see a more complete demo of what it can do.
crc.hpp:157: keyword 'explicit' not followed by a single space
crc.hpp:588: closing curly bracket not in the same line or column
dynamic_property_map.hpp:82: keyword 'if' not followed by a single space
functional.hpp:106: line is longer than 100 characters
multi_index_container.hpp:472: comma should be followed by whitespace
version.hpp:37: too many consecutive empty lines
weak_ptr.hpp:108: keyword 'catch' not followed by a single space
...
I have had good feedback about Artistic Style which allows to apply a uniform style on code without too much hassle.
It's free and there are plenty of "classic" styles already defined. It might not work with C++0x new constructs though.
I am also expecting a Clang library, though I haven't found any to date. Normally, given Clang's structure it should be relatively easy, but then it's always easier to say than to code and I guess nobody took the time yet.
KWStyle seems to be a lightweight fit
While editing this and that in Vim, I often find that its syntax highlighting (for some filetypes) has some defects. I can't remember any examples at the moment, but someone surely will. Usually, it consists of strings badly highlighted in some cases, some things with arithmetic and boolean operators and a few other small things as well.
Now, vim uses regexes for that kinda stuff (its own flavour).
However, I've started to come across editors which, at first glance, have syntax highlighting better taken care of. I've always thought that regexes are the way to go for that kind of stuff.
So I'm wondering, do those editors just have better written regexes, or do they take care of that in some other way ? What ? How is syntax highlighting taken care of when you want it to be "stable" ?
And in your opinion what is the editor that has taken care it the best (in your editor of choice), and how did he do it (language-wise) ?
Edit-1: For example, editors like Emacs, Notepad2, Notepad++, Visual Studio - do you perchance know what mechanism they use for syn. high. ?
The thought that immediately comes to mind for what you'd want to use instead of regexes for syntax highlighting is parsing. Regexes have a lot of advantages, but as we see with vim's highlighting, there are limits. (If you look for threads about using regexes to analyze XML, you'll find extensive material on why regexes can't do what parsers do.)
Since what we want from syntax highlighting is for it to follow the syntactic structure of the language, which regexes can only approximate, you need to perform some level of real parsing to go beyond what regexes can do. A simple recursive descent lexer will probably do great for most languages, I'm thinking.
Some programming languages have a formal definition/specification written in Backus-Naur Form. All*) programming languages can be described in it. All you then need, is some kind of parser for the notation.
*) not verified
For instance, C's BNF definition is "only five pages long".
If you want accurate highlighting one needs real programming not regular expressions. RegExs are rarely the answer fir anything but trivial tasks. To do highlighting in a better way you need to write a simple parser. Parses basically have separate components that each can do something like identify and consume a quoted string or number literal. If said component when looking at it's given cursor can't consume what's underneath it does nothing. From that you can easily parse or highlight fairly simply and easily.
Given something like
static int field = 123;
• The first macher would skip the whitespace before "static". The keyword, literal etc matchers would do nothing because handling whitespace is not their thing.
• The keyword matched when positioned over "static" would consume that. Because "s" is not a digit the literal matched does nothing. The whitespace skipper does nothing as well because "s" is not a whitespace character.
Naturally your loop continues to advance the cursor over the input string until the end is reached. The ordering of your matchers is of course important.
This approach is both flexible in that it handles syntactically incorrect fragments and is also easy to extend and reuse individual matchers to support highlighting of other languages...
I suggest the use of REs for syntax highlighting. If it's not working properly, then your RE isn't powerful or complicated enough :-) This is one of those areas where REs shine.
But given that you couldn't supply any examples of failure (so we can tell you what the problem is) or the names of the editors that do it better (so we can tell you how they do it), there's not a lot more we'll be able to give you in an answer.
I've never had any trouble with Vim with the mainstream languages and I've never had a need to use weird esoteric languages, so it suits my purposes fine.