Vim syntax: Matching namespace qualified symbols - clojure

I'm using vim-clojure-static, a Clojure plugin. I can add my own functions and macros to a syntax group by doing for example:
syntax keyword clojureMacro defsystem
But in Clojure, after one has required (imported) code from other namespaces, one has to namespace qualify the functions and macros. For example, if I required the namespace my-namespace and defsystem was in my-namespace, I would have to refer to it by ny-namespace/defsystem. But as one requires another namespace, one may shorten (actually rename) the namespace name, to, for example, my/defsystem.
So, the problem: the syntax keyword clojureMacro defsystem does not work if defsystem is namespace qulified, like my/defsystem. And the namespace qualifier can be anything. How can I fix that? The regex '\m[a-z.+\-*_!?]\+\/' matches namespace qualifiers. So basically I want code that matches '\m[a-z.+\-*_!?]\+\/' immediately followed by a clojureMacro, to be highlighted as if the whole thing was a clojureMacro.

It doesn't really answer your question, but anyhow.
Back in the days VimClojure provided the so-called "dynamic highlighting." It would inspect the required namespaces (and their aliases) and would dynamically add the symbols of the referenced namespaces to the highlighting. So if you typed "m/defsystem" it would by highlighted, but "m/non-existant" or "not-m/defsystem" would not. The highlighting did respect being a macro or function etc. Would you ever change the namespace alias from "m" to something else, the highlighting would (almost) automatically adapt. However it needed a backend server running.
Maybe you want to ping the guys on the vimclojure google group. It's all about vim and clojure. Maybe someone there is willing to lend a hand and carry over this functionality as a fireplace extension.

For anyone out there still looking for a solution, I've reimplemented VimClojure's dynamic highlighting feature as a fireplace.vim plugin:
https://github.com/guns/vim-clojure-highlight

Related

Need regex to locate c++ namespace declarations with unknown names

I'm trying to understand open source c++ code and I need a way to create a list of all the declared namespaces. I'm writing my code in Xojo (realbasic) with has built-in regex handling.
My problem is I'm not familiar enough with regular expressions to construct the correct expression to locate "namespace " followed by an unknown name then " {" all on the same line of text.
I can code everything else myself, I just need the proper regular expression. All help appreciated.
You may try namespace\s+(\w+)\s*\{ for the most common cases without comments between words and won't match something like using namespace std;. Anyway namespaces can be nested, but here you'll get only flat list of all names.

How to modify C++ code from user-input

I am currently writing a program that sits on top of a C++ interpreter. The user inputs C++ commands at runtime, which are then passed into the interpreter. For certain patterns, I want to replace the command given with a modified form, so that I can provide additional functionality.
I want to replace anything of the form
A->Draw(B1, B2)
with
MyFunc(A, B1, B2).
My first thought was regular expressions, but that would be rather error-prone, as any of A, B1, or B2 could be arbitrary C++ expressions. As these expressions could themselves contain quoted strings or parentheses, it would be quite difficult to match all cases with a regular expression. In addition, there may be multiple, nested forms of this expression
My next thought was to call clang as a subprocess, use "-dump-ast" to get the abstract syntax tree, modify that, then rebuild it into a command to be passed to the C++ interpreter. However, this would require keeping track of any environment changes, such as include files and forward declarations, in order to give clang enough information to parse the expression. As the interpreter does not expose this information, this seems infeasible as well.
The third thought was to use the C++ interpreter's own internal parsing to convert to an abstract syntax tree, then build from there. However, this interpreter does not expose the ast in any way that I was able to find.
Are there any suggestions as to how to proceed, either along one of the stated routes, or along a different route entirely?
What you want is a Program Transformation System.
These are tools that generally let you express changes to source code, written in source level patterns that essentially say:
if you see *this*, replace it by *that*
but operating on Abstract Syntax Trees so the matching and replacement process is
far more trustworthy than what you get with string hacking.
Such tools have to have parsers for the source language of interest.
The source language being C++ makes this fairly difficult.
Clang sort of qualifies; after all it can parse C++. OP objects
it cannot do so without all the environment context. To the extent
that OP is typing (well-formed) program fragments (statements, etc,.)
into the interpreter, Clang may [I don't have much experience with it
myself] have trouble getting focused on what the fragment is (statement? expression? declaration? ...). Finally, Clang isn't really a PTS; its tree modification procedures are not source-to-source transforms. That matters for convenience but might not stop OP from using it; surface syntax rewrite rule are convenient but you can always substitute procedural tree hacking with more effort. When there are more than a few rules, this starts to matter a lot.
GCC with Melt sort of qualifies in the same way that Clang does.
I'm under the impression that Melt makes GCC at best a bit less
intolerable for this kind of work. YMMV.
Our DMS Software Reengineering Toolkit with its full C++14 [EDIT July 2018: C++17] front end absolutely qualifies. DMS has been used to carry out massive transformations
on large scale C++ code bases.
DMS can parse arbitrary (well-formed) fragments of C++ without being told in advance what the syntax category is, and return an AST of the proper grammar nonterminal type, using its pattern-parsing machinery. [You may end up with multiple parses, e.g. ambiguities, that you'll have decide how to resolve, see Why can't C++ be parsed with a LR(1) parser? for more discussion] It can do this without resorting to "the environment" if you are willing to live without macro expansion while parsing, and insist the preprocessor directives (they get parsed too) are nicely structured with respect to the code fragment (#if foo{#endif not allowed) but that's unlikely a real problem for interactively entered code fragments.
DMS then offers a complete procedural AST library for manipulating the parsed trees (search, inspect, modify, build, replace) and can then regenerate surface source code from the modified tree, giving OP text
to feed to the interpreter.
Where it shines in this case is OP can likely write most of his modifications directly as source-to-source syntax rules. For his
example, he can provide DMS with a rewrite rule (untested but pretty close to right):
rule replace_Draw(A:primary,B1:expression,B2:expression):
primary->primary
"\A->Draw(\B1, \B2)" -- pattern
rewrites to
"MyFunc(\A, \B1, \B2)"; -- replacement
and DMS will take any parsed AST containing the left hand side "...Draw..." pattern and replace that subtree with the right hand side, after substituting the matches for A, B1 and B2. The quote marks are metaquotes and are used to distinguish C++ text from rule-syntax text; the backslash is a metaescape used inside metaquotes to name metavariables. For more details of what you can say in the rule syntax, see DMS Rewrite Rules.
If OP provides a set of such rules, DMS can be asked to apply the entire set.
So I think this would work just fine for OP. It is a rather heavyweight mechanism to "add" to the package he wants to provide to a 3rd party; DMS and its C++ front end are hardly "small" programs. But then modern machines have lots of resources so I think its a question of how badly does OP need to do this.
Try modify the headers to supress the method, then compiling you'll find the errors and will be able to replace all core.
As far as you have a C++ interpreter (as CERN's Root) I guess you must use the compiler to intercept all the Draw, an easy and clean way to do that is declare in the headers the Draw method as private, using some defines
class ItemWithDrawMehtod
{
....
public:
#ifdef CATCHTHEMETHOD
private:
#endif
void Draw(A,B);
#ifdef CATCHTHEMETHOD
public:
#endif
....
};
Then compile as:
gcc -DCATCHTHEMETHOD=1 yourfilein.cpp
In case, user want to input complex algorithms to the application, what I suggest is to integrate a scripting language to the app. So that the user can write code [function/algorithm in defined way] so the app can execute it in the interpreter and get the final results. Ex: Python, Perl, JS, etc.
Since you need C++ in the interpreter http://chaiscript.com/ would be a suggestion.
What happens when someone gets ahold of the Draw member function (auto draw = &A::Draw;) and then starts using draw? Presumably you'd want the same improved Draw-functionality to be called in this case too. Thus I think we can conclude that what you really want is to replace the Draw member function with a function of your own.
Since it seems you are not in a position to modify the class containing Draw directly, a solution could be to derive your own class from A and override Draw in there. Then your problem reduces to having your users use your new improved class.
You may again consider the problem of automatically translating uses of class A to your new derived class, but this still seems pretty difficult without the help of a full C++ implementation. Perhaps there is a way to hide the old definition of A and present your replacement under that name instead, via clever use of header files, but I cannot determine whether that's the case from what you've told us.
Another possibility might be to use some dynamic linker hackery using LD_PRELOAD to replace the function Draw that gets called at runtime.
There may be a way to accomplish this mostly with regular expressions.
Since anything that appears after Draw( is already formatted correctly as parameters, you don't need to fully parse them for the purpose you have outlined.
Fundamentally, the part that matters is the "SYMBOL->Draw("
SYMBOL could be any expression that resolves to an object that overloads -> or to a pointer of a type that implements Draw(...). If you reduce this to two cases, you can short-cut the parsing.
For the first case, a simple regular expression that searches for any valid C++ symbol, something similar to "[A-Za-z_][A-Za-z0-9_\.]", along with the literal expression "->Draw(". This will give you the portion that must be rewritten, since the code following this part is already formatted as valid C++ parameters.
The second case is for complex expressions that return an overloaded object or pointer. This requires a bit more effort, but a short parsing routine to walk backward through just a complex expression can be written surprisingly easily, since you don't have to support blocks (blocks in C++ cannot return objects, since lambda definitions do not call the lambda themselves, and actual nested code blocks {...} can't return anything directly inline that would apply here). Note that if the expression doesn't end in ) then it has to be a valid symbol in this context, so if you find a ) just match nested ) with ( and extract the symbol preceding the nested SYMBOL(...(...)...)->Draw() pattern. This may be possible with regular expressions, but should be fairly easy in normal code as well.
As soon as you have the symbol or expression, the replacement is trivial, going from
SYMBOL->Draw(...
to
YourFunction(SYMBOL, ...
without having to deal with the additional parameters to Draw().
As an added benefit, chained function calls are parsed for free with this model, since you can recursively iterate over the code such as
A->Draw(B...)->Draw(C...)
The first iteration identifies the first A->Draw( and rewrites the whole statement as
YourFunction(A, B...)->Draw(C...)
which then identifies the second ->Draw with an expression "YourFunction(A, ...)->" preceding it, and rewrites it as
YourFunction(YourFunction(A, B...), C...)
where B... and C... are well-formed C++ parameters, including nested calls.
Without knowing the C++ version that your interpreter supports, or the kind of code you will be rewriting, I really can't provide any sample code that is likely to be worthwhile.
One way is to load user code as a DLL, (something like plugins,)
this way, you don't need to compile your actual application, just the user code will be compiled, and you application will load it dynamically.

Changing a naming scheme in Eclipse

Is there a way to change variables' naming conventions in Eclipse (specifically Eclipse CDT)? For example, can I do a search-and-replace of variables with names like need_foo and change that to NeedFoo?
Adding and removing underscores is easy, obviously, but I don't see a way to change case. Perl's regexes have \u and \l modifiers to uppercase and lowercase characters, but Eclipse's apparently don't.
There's no automated way of mass-renaming multiple non conforming function names in CDT. There is, however, a Code Analysis rule, which is designed to point out these sort of things. It uses Eclipse Regex described in their online help. These will give you an "Information" level marker in your Problems View.
The way they work is matching a regex against each function name, and raising an error/warning/information if that doesn't return a match. You can access them via "Window->Preferences->C/C++->Code Analysis". It's about half way down the scroll list (in Eclipse Indigo).
To directly answer your second paragraph, Eclipse does not have an equivalent of Perl's \u and \l, the closest it has is (?ismd-ismd), which allows you to turn on matching based on case.
Depending on if the Code Analysis tool returns 5 or 50,000 errors:
Only a few function definitions to rename
You can use the standard refactoring renaming tool. Right click the function name, "Refactor->Rename" will replace all references to that function with your new function name (respecting different scopes).
Many many errors
This is... not as nice. Seeing as there's no built in method, you need to do it externally. The first approach I'd take is to see if there's an existing plugin out there that would do this.
Failing that, what you could perhaps do is use the Code Analysis tool to identify non-conforming function names, and then using the output from that as input to a custom Perl script? You could do it in a few stages:
void FooBar(void) {}
int main(int argc, char *argv[])
{
FooBar();
return 0;
}
1) Run the code analysis, and copy and paste the warnings into a text file:
Bad function name "FooBar" (pattern /^(?-i:([a-z]+_[A-Z])|[a-z])(?i:[a-z]*)\z/) main.c /TestMultiThread line 47 Code Analysis Problem
2) Change the function definition, fooBar() --> FooBar(), using the above errors as input for a perl script (note you have the badly-conforming-function name, the file name and line number).
3) Compile it and then use the output from the compiler's undefined reference to fooBar() to rename any references:
undefined reference to `FooBar' main.c /TestMultiThread line 50 C/C++ Problem
This method would have some short comings, such as the compiler giving a partial list of undefined references due to the compilation terminating 'early', in which case you'd want to run it multiple times.
Another thing to look at is Refactor Scripts (Refactor --> Apply Script), but from the little I've seen of that, I don't think it's going to do what you want.
All in all, I've found the refactoring tools in Eclipse CDT to be no where near as powerful as those for Java (from what I remember). Still better than those in Notepad though (also, not bashing CDT, it's an awesome development environment!)

Highlight arguments in function body in vim

A little something that could be borrowed from IDEs. So the idea would be to highlight function arguments (and maybe scoped variable names) inside function bodies. This is the default behaviour for some C:
Well, if I were to place the cursor inside func I would like to see the arguments foo and bar highlighted to follow the algorithm logic better. Notice that the similarly named foo in func2 wouldn't get highlit. This luxury could be omitted though...
Using locally scoped variables, I would also like have locally initialized variables highlit:
Finally to redemonstrate the luxury:
Not so trivial to write this. I used the C to give a general idea. Really I could use this for Scheme/Clojure programming better:
This should recognize let, loop, for, doseq bindings for instance.
My vimscript-fu isn't that strong; I suspect we would need to
Parse (non-regexply?) the arguments from the function definition under the cursor. This would be language specific of course. My priority would be Clojure.
define a syntax region to cover the given function/scope only
give the required syntax matches
As a function this could be mapped to a key (if very resource intensive) or CursorMoved if not so slow.
Okay, now. Has anyone written/found something like this? Do the vimscript gurus have an idea on how to actually start writing such a script?
Sorry about slight offtopicness and bad formatting. Feel free to edit/format. Or vote to close.
This is much harder than it sounds, and borderline-impossible with the vimscript API as it stands, because you don't just need to parse the file; if you want it to work well, you need to parse the file incrementally. That's why regular syntax files are limited to what you can do with regexes - when you change a few characters, vim can figure out what's changed in the syntax highlighting, without redoing the whole file.
The vim syntax highlighter is limited to dealing with regexes, but if you're hellbent on doing this, you can roll your own parser in vimscript, and have it generate a buffer-local syntax that refers to tokens in the file by line and column, using the \%l and \%c atoms in a regex. This would have to be rerun after every change. Unfortunately there's no autocmd for "file changed", but there is the CursorHold autocmd, which runs when you've been idle for a configurable duration.
One possible solution can be found here. Not the best way because it highlights every occurrence in the whole file and you have to give the command every time (probably the second one can be avoided, don't know about the first). Give it a look though.

What are Clojure's Naming Conventions?

Can anyone explain or point me to where I can find clojure's naming conventions for:
File names
Functions (From what I understand, function names are simply dash separated values)
Variables
You might want to look at the Clojure library coding standards on the developer Wiki - this is probably the most comprehensive list that I've seen.
Update: link above seems to be dead, consider instead: https://clojure.org/dev/contrib_howto#_coding_guidelines
To your specific points:
File names are lowercase, and stored in a directory structure to match the namespace, and end in .clj e.g. "my/special/namespace.clj
Functions are dash-separated-lowercase-words, ideally descriptively chosen so that your code is clear and self-documenting. Don't be afraid to re-use good function names in different namespaces (that is what namespaces are for!).
Variables (by which I assume you mean parameters, let-bound variables etc.) are also usually dash-separated-lowercase-words. Since code-is-data, I think it is appropriate that functions and data have the same naming convention :-)
You might want to take a look at this non official style guide.
There are some interesting guidelines on naming written by Stuart Sierra which suggest that:
pure functions should be nouns describing the return value (age instead of calculate-age)
side-effecting functions should be verbs describing the action (create- for constructing and get- for retrieving), reserving the bang swap! changes to mutable references.
verbs that can also be nouns should be distinguished as verb phrases (send-message instead of message)
coercions should name the output type without an arrow prefix (connection instead of ->connection) except when the input type must be explicit (input-type->output-type)
namespace aliases can save on repetition (products/price instead of products/product-price) and prevent local clashes in let bindings
functions returning functions should have the -fn suffix
There is an interesting set of naming conventions documented in a comment by
Taoensso in his
Encore library.
He proposes names using ! for side-effects, ? for booleans,
$ for expensive operations, _ as dereffable,
* for macros; plus a few other combos.
Even though you didn't ask for it explicitly, I'll explain what I've seen for protocol naming conventions.
Typically, the name starts with an uppercase "I" and then the rest is camel case, where the first letter of each word is capitalized, and the rest is lower case. For example, I want to define a protocol for rocket ships, I'd use the name IRocketShip
I've also seen 'A' instead of 'I' used, probably to represent the word 'abstract'.