Problems with matches containing space for gtksourceview? - regex

I'm working on improving syntax highlighting for Ada in gtksourceview (currently, it is very outdated and very incomplete). An issue I'm having, is Ada is very positional, so matching many constructs requires matching those positions. I was able to do this in nano fairly easily.
So, let's consider a type declaration such as:
type Trit is range 0..2;
Keywords like "type", "is" and "range" are recognized (and were originally). However, type names were treated as keywords (a bad design decision, as Ada regularly defines new types, even for simple types like integers). What the use gets, is the types in Standard being colored, and all other types looking like normal text, defeating the purpose of highlighting. In some languages this might be a notable problem. However, the majority of type names occur after two regex patterns:
type\s+(\w|\.|_)+
:\s+(\w|\.|_)+
It might just be a matter of implementation (nano and gtksourceview seem to use different regex implementations). I thought the problem was recognizing spaces. As it turns out, putting the type context above the keyword context results in types now being highlighted, but the "type" keyword, or ":" operator are then not highlighted properly (they are highlighted as "type"). I was able to override this in nano, resulting in correct highlighting, but cannot seem to find out how gtksourceview does this.
Here you can see the old gtksourceview definition in action, which doesn't work for a file with many custom types. My nano definition in action sidebyside for comparison; matching by position is definately possible and works.
Here is what happens when I put the type context below the keyword context.
Here is what happens when I put the type context above the keyword context.
In both cases the context is the same, just a simple pattern to get started.
<context id="type" style-ref="type">
<match>(type)\s+\w+</match>
</context>

You may want to consider generating the parser from the formal description of the syntax of Ada in annex P of the Language Reference Manual.
Unfortunately this doesn't answer your question of how to formulate the syntax for a GtkSourceView.

Related

Enable Sublime Text 3 or 4's syntax highlighting for custom types

I'm tring to set a specific color on Sublime Text (coding C/C++) for custom types such as ones created with typedefs. By default, ST3/4 seems to treat one of these custom types as variable names, hence coloring them in the same way.
I found this question, which is quite the same but it's about Vim, mine is specific to ST.
My problem is identical:
Here you see InitMyStruct takes a pointer to MyStruct as an argument and returns one, but they are not colored like a type which is, in my opinion, misleading.
Is there a way to make Sublime Text 3 or 4 change the color of those custom types? Also, I have the same request for references to #defined elements such as with #define ERROR_CODE (-1), ERROR_CODE will appear as a common variable elsewhere in the code.
I tried looking into preferences for color scheme but there is no Sublime scope for that. I doubt this is a problem with my theme/color scheme as I can't find a single color scheme which handles that.
The reason they aren’t colored as types is because they aren’t built-in. You’ve declared it elsewhere in the file, or perhaps even elsewhere in the project. Regex-based syntax highlighting (mostly) only knows about the string of characters on a given line, so it can only provide scopes for what it can anticipate a priori based purely on the syntax.
Your best bet to get relevant scopes for these types would be to use LSP. I haven’t used it, though, so I’m not sure whether it will actually help in this scenario.

Why does Crystal's macro syntax for iterating differ from the rest of Crystal

Coming from the Ruby world, I instantly understood why Crystal chose not to implement a for method. But then I was surprised to see that Crystal does implement a for method for macros. I was even more surprised to find that macros don't allow an enumerable (.each, etc) syntax (i.e. {% ["one", "two", "three"].each do |value| %} isn't valid macro syntax).
Is there a logical reason for this syntax difference? It's possible that the answer is simply ~"because the devs decided that macro syntax looks like x, and non-macro syntax looks like y", but I'm guessing that there is more to it then that (an arbitrary syntax inconsistency seems like a flaw).
Thanks!
The main reason is that when the parser parses foo.bar do |arg| ... end, it expects an expression after |arg|, not %}, which is a parse error. So to allow that we'd need to enhance the parser (which is already quite complex) to take that into account. for was decided because of this, but also to make it clear that it's just not regular crystal but a different thing (it's an interpreted subset of crystal and the standard library).
Another reason is that if each and other iteration methods are allowed, why not while and until? That could allow endless loops in macros, which with just for aren't possible, so you can guarantee a macro finishes executing. Which... is actually not true given that we have run inside macros.
So I think I'm not opposed to change the language to allow each, each_with_index, etc., inside macros, and allow that syntax, and eventually remove for from the macro language. Opening an issue requesting this is a good way in this direction.

How to identify which forms are macros and which are functions while looking at a Clojure code?

Lisp/Clojure code have consistency in their syntax and it is a plus point as one doesn't need to understand various different constructs.
But at times It is easier to understand by looking at a piece of code just by the different syntax being used like this is a switch case or this is the pattern matching construct etc without actually reading the text.
I have started out with Clojure couple of months ago and I have realized I can't understand the code without reading the name of the form and then googling whether it is a macro or a function and how it works.
So it turns out that, a piece of Clojure code, irrespective fo the uniformity of the syntax isn't uniform.
It may seem like a function but if at all it is a macro then it might not be evaluating all its arguments.
Is there a naming convention or indentation style that all macros use so it is easier for someone to grasp by the name what is going on ?
The most useful intuition in my opinion comes from understanding the purpose of a given operator / Var. Well-designed macros simply could not be written as functions and still offer the same functionality with the same syntax, for if they could, they would in fact be written as functions (see the "well-designed" part above!).1 So, if you're dealing with a construct which couldn't possibly be a regular function, then you know it isn't; otherwise it likely is.
Additionally, the usual ways of learning about the Vars exported by a library tell you whether you're dealing with a macro or a function up front. That is true of doc ((doc foo) says that foo is a macro near the top of its output if that is indeed the case), source (since it gives you the entire code) and M-. (jump to definition in Emacs with nrepl.el or swank-clojure; M-, jumps back). Documentation may be expected to mention what is a macro and what isn't (except that's not necessarily true of docstrings, since all usual ways of accessing a docstring already tell you whether you're dealing with a macro, as explained above).
If you're skimming a body of code with the intention of forming a rough understanding of what it probably does on the assumption that the various operators perform the functions suggested by their names, then either (1) the names are suggestive enough and you get an idea of what's intended by the code, so you don't even need to care which operators happen to be macros, or (2) the names are not suggestive enough, so you'll need to dive into the docs or the source for some of the operators anyway, and then the first thing you'll learn is which of them are registered as macros.
Finally, there is no single naming style for macros, although there are certain conventions specific to particular use cases. For example with-foo-style constructs tend to be convenience macros whose purpose is to simplify dealing with resources of type foo; dofoo-style constructs tend to be macros which take a body of expressions to be executed (how many times and with which additional context set up depends on the macro; the most basic member of this family, do, is actually a special form rather than a macro); deffoo-style constructs introduce new Vars or type-like entities.
It's worth pointing out that similar patterns are sometimes broken. For instance, most threading constructs (-> & Co.) are macros, but xml-> from clojure.data.zip.xml is a function. That makes perfect sense when one considers the functionality provided, which brings us back to the point about the purpose of an operator being the most useful source of intuition.
1 There might be some exceptions to this rule. One would expect these to be documented. Some projects are of course not documented at all (or very nearly so); here the issue goes away completely, since one must go to the source to make sense of things anyway.
There are two attributes that typically distinguish a macro (or sometimes special form) from a function:
When the form does some sort of binding (i.e. declaring new identifiers for later use)
When some of the arguments are evaluated lazily
Examples of the first case are let, letfn, binding and with-local-vars. Strangely though, defn is defined as a function, but I'm pretty sure it has something to do with Clojure's bootstrapping process (defn is defined before defmacro is defined).
Examples of the second would be and, or and lazy-seq. In all these constructs, the arguments are evaluated lazily by either putting them in conditional branches (like if) or moving them inside a function body.
Both of those attributes are really just manifestations of the macro manipulating the Clojure syntax. I don't think the threading macros (-> and ->>) fit very well into either of those categories, but the nil-safe versions (-?> and -?>>) kind of fall under having lazy arguments.
As far as I know there is no enforced naming convention.
As a rule of thumb, functions are preferred wherever possible, but macros can sometimes be spotted when they follow the pattern def<something> for setting up a something or with-<resource> for doing something with an open resource.
Because of this, you may find clojure's doc macro helpful. It will tell you whether a form is a macro/function/special form, as well as give it's arg list and doc string (if present). For example
(use 'clojure.repl)
(doc and)
Will print the following to the repl.
clojure.core/and
([] [x] [x & next])
Macro
Evaluates exprs one at a time, from left to right. If a form
returns logical false (nil or false), and returns that value and
doesn't evaluate any of the other expressions, otherwise it returns
the value of the last expr. (and) returns true.
Some editors (e.g. emacs) will provide this documentation as a pop-up or on a key combination, which makes accessing it (and reading) much faster.

How can I use different colors for virtual and virtual pure methods?

I am trying to get different colors for virtual and pure virtual methods, like this
syn region cppVirtualPureMethod start="virtual" end="= 0;"
syn region cppVirtualMethod start="virtual" end="[;{]"
unfortunately, the selection is performed only using the start identifier, so it cannot disambiguate between the two cases. Is there some vim trick to obtain what I need?
As you've already found out, :syn region only considers the start= portion for a match. You have to use :syn match (potentially with a costly regular expression that matches across lines) to include the differentiating end.
In general (considering that you've attempted something similar beforehand), such elaborate highlighting is difficult to do in Vim, whose syntax parsing is designed for broad applicability and 80/20-correctness, not exact grammar representations. If you really need such fine nuances displayed in different visual styles (especially for C++, which has a very complex grammar), I'd rather use an IDE with a proper parser for the full language.

What does "statically typed" and "free-form" mean for C++?

In the C++ tag wiki, it is mentioned that
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language.
Can someone please explain the terms "statically typed" and "free-form"?
Thanks.
A statically-typed language is a language where every variable has a type assigned to it at compile-time. In C++, this means that you must tell the compiler the type of each variable - that is, whether it's an int, or a double, or a string, etc. This contrasts with dynamically-typed languages like JavaScript or PHP, where each variable can hold any type, and that type can change at runtime.
A free-form language is one where there are no requirements about where various symbols have to go with regard to one another. You can add as much whitespace as you'd like (or leave out any whitespace that you don't like). You don't need to start statements on a new line, and can put the braces around code blocks anywhere you'd like. This has led to a few holy wars about The Right Way To Write C++, but I actually like the freedom it gives you.
Hope this helps!
"Statically typed" means that the types are checked at compile-time, not run-time. For example, if you write a class that does not have a foo() method, then you'll get a compile-time error if you try to call foo() on an object of that class. In dynamically-typed languages (e.g. Ruby), you would still get an error, but only at run-time.
"Free-form" means that you can use whitespace however you want (i.e. write the whole program on one line, use uneven indenting, put lots of blank lines, etc.). This is in contrast to languages like Python where whitespace is semantically significant.
Statically typed: the compiler knows what the types of all variables are. In contrast to languages like Python and Common Lisp, where the types of variables can change at runtime.
Free-form: no specific whitespace requirements. This is in contrast to old-style FORTRAN and COBOL, so I'm not sure how useful this designation is anymore.