Need regex to locate c++ namespace declarations with unknown names - c++

I'm trying to understand open source c++ code and I need a way to create a list of all the declared namespaces. I'm writing my code in Xojo (realbasic) with has built-in regex handling.
My problem is I'm not familiar enough with regular expressions to construct the correct expression to locate "namespace " followed by an unknown name then " {" all on the same line of text.
I can code everything else myself, I just need the proper regular expression. All help appreciated.

You may try namespace\s+(\w+)\s*\{ for the most common cases without comments between words and won't match something like using namespace std;. Anyway namespaces can be nested, but here you'll get only flat list of all names.

Related

LEX? Shared regular expression

I am working with LEX and YACC. I have a question regarding how to define tokens, I mean I have two regular expressions which share some characters, see the example below:
SHARED "+"|"-"|"/"|"="|"%"|"?"|"!"|"."|"$"|"_"|"~"|"&"|"^"|"<"|">"|"("|")"|","
REXP_1 {SHARED}|[a-zA-Z]|[ \t]+|[\\][\\\"]
REXP_2 {SHARED}|[a-zA-Z]|[ \t]+|"*"
Now my point is how to identify when a character from the shared regular expression correspond to REXP_1 or REXP_2 when I define the tokens in the third section of the .lex file.
I think I am misunderstanding something, I guess that the way I write the regular expression is wrong but I do not find a way to put it in a better way. Could you please give me some hints?
More over I would appreciate if someone could advice me some criteria to determine when to define a token (file.lex) or when to define a symbol in the grammar(file.y). For some symbols it is easy to figure out if it is a token or a grammar symbol but for some others I find it difficult to define where to put them.
By the way I am working with this grammar
(Answered in a question edit)
The OP wrote:
Just in case someone find it interesting I am going to write out the lessons I learned. I think that the most important lesson I learnt is that common sense is a great tool to figure out what is a intern token in the .lex file and what is a suitable token to share with the .y file.
Since the term 'common sense' may be a bit ambiguous I post the following example:
ALPHA_NUMERIC [a-bA-B0-9]
SQ_CHAR {SHARED}|{ALPHA_NUMERIC}
SINGLE_QUOTED {SINGLE_QUOTE}{SQ_CHAR}{SQ_CHAR}*{SINGLE_QUOTE}
where ALPHA_NUMERIC is a good intern token (file.lex) but is a bad token to share in the grammar file whereas SINGLE_QUOTED may be a good token to share with the grammar(file.y). I wrote 'may be' because it is very dependent of the specific grammar we are working on, in my concrete case it is a good token to share with the YACC file.
What I did is to define as a token a regexp similar to the one #OGHaza advised me in file.lex and then I use it in the grammar itself (file.y).

Vim syntax: Matching namespace qualified symbols

I'm using vim-clojure-static, a Clojure plugin. I can add my own functions and macros to a syntax group by doing for example:
syntax keyword clojureMacro defsystem
But in Clojure, after one has required (imported) code from other namespaces, one has to namespace qualify the functions and macros. For example, if I required the namespace my-namespace and defsystem was in my-namespace, I would have to refer to it by ny-namespace/defsystem. But as one requires another namespace, one may shorten (actually rename) the namespace name, to, for example, my/defsystem.
So, the problem: the syntax keyword clojureMacro defsystem does not work if defsystem is namespace qulified, like my/defsystem. And the namespace qualifier can be anything. How can I fix that? The regex '\m[a-z.+\-*_!?]\+\/' matches namespace qualifiers. So basically I want code that matches '\m[a-z.+\-*_!?]\+\/' immediately followed by a clojureMacro, to be highlighted as if the whole thing was a clojureMacro.
It doesn't really answer your question, but anyhow.
Back in the days VimClojure provided the so-called "dynamic highlighting." It would inspect the required namespaces (and their aliases) and would dynamically add the symbols of the referenced namespaces to the highlighting. So if you typed "m/defsystem" it would by highlighted, but "m/non-existant" or "not-m/defsystem" would not. The highlighting did respect being a macro or function etc. Would you ever change the namespace alias from "m" to something else, the highlighting would (almost) automatically adapt. However it needed a backend server running.
Maybe you want to ping the guys on the vimclojure google group. It's all about vim and clojure. Maybe someone there is willing to lend a hand and carry over this functionality as a fireplace extension.
For anyone out there still looking for a solution, I've reimplemented VimClojure's dynamic highlighting feature as a fireplace.vim plugin:
https://github.com/guns/vim-clojure-highlight

Making a parser to extract function name, parameters, return type

I need to parse a C++ class file (.h) and extract the following informations:
Function names
Return types
List of parameter types of each function
Assume that there is a special tag using which I can recognize if I need to parse a function or not.
For eg.
#include <someHeader>
class Test
{
public:
Test();
void fun1();
// *Expose* //
void fun2();
};
So I need to parse only fun2().
I read the basic grammar here, but found it too complex to comprehend.
Q1. I can't make out how complex this task is. Can someone provide a simpler grammar for a function declaration to perform this parsing?
Q2. Is my approach right or should I consider using some library rather than reinventing?
Edit: Just to clarify, I don't have problem parsing, problem is more of understanding the grammar I need to parse.
A C++ header may include arbitrary C++ code. Hence, parsing the header might be as hard as parsing all kinds of C++ code.
Your task becomes easier, if you can make certain assumptions about your header file. For instance, if you always have an EXPOSE-tag in front of your function and the functions are always on a single line, you could first grep for those lines:
grep -A1 EXPOSE <files>
And then you could apply a regular expression to filter out the information you need.
Nevertheless, I'd recommend using existing tools. This seems to be a tutorial on how to do it with clang and Python.
GCC XML is an open source tool that emits the AST (Abstract Syntax Tree). See this other answer where I posted about the usage I made of it.
You should consider to use only if you are proficient (or akin to learn) with an XML analyzer for inspecting the AST. It's a fairly complex structure...
You will need anyway to 'grep' for the comments identifying your required snippets, as comments are lost in output XML.
IF you are doing this just for documentation doxygen could be a good bet.
Either way it may give you some pointers as to how to do this.

Highlight arguments in function body in vim

A little something that could be borrowed from IDEs. So the idea would be to highlight function arguments (and maybe scoped variable names) inside function bodies. This is the default behaviour for some C:
Well, if I were to place the cursor inside func I would like to see the arguments foo and bar highlighted to follow the algorithm logic better. Notice that the similarly named foo in func2 wouldn't get highlit. This luxury could be omitted though...
Using locally scoped variables, I would also like have locally initialized variables highlit:
Finally to redemonstrate the luxury:
Not so trivial to write this. I used the C to give a general idea. Really I could use this for Scheme/Clojure programming better:
This should recognize let, loop, for, doseq bindings for instance.
My vimscript-fu isn't that strong; I suspect we would need to
Parse (non-regexply?) the arguments from the function definition under the cursor. This would be language specific of course. My priority would be Clojure.
define a syntax region to cover the given function/scope only
give the required syntax matches
As a function this could be mapped to a key (if very resource intensive) or CursorMoved if not so slow.
Okay, now. Has anyone written/found something like this? Do the vimscript gurus have an idea on how to actually start writing such a script?
Sorry about slight offtopicness and bad formatting. Feel free to edit/format. Or vote to close.
This is much harder than it sounds, and borderline-impossible with the vimscript API as it stands, because you don't just need to parse the file; if you want it to work well, you need to parse the file incrementally. That's why regular syntax files are limited to what you can do with regexes - when you change a few characters, vim can figure out what's changed in the syntax highlighting, without redoing the whole file.
The vim syntax highlighter is limited to dealing with regexes, but if you're hellbent on doing this, you can roll your own parser in vimscript, and have it generate a buffer-local syntax that refers to tokens in the file by line and column, using the \%l and \%c atoms in a regex. This would have to be rerun after every change. Unfortunately there's no autocmd for "file changed", but there is the CursorHold autocmd, which runs when you've been idle for a configurable duration.
One possible solution can be found here. Not the best way because it highlights every occurrence in the whole file and you have to give the command every time (probably the second one can be avoided, don't know about the first). Give it a look though.

How to create regexp parsing pascal-like function declaration with body?

How to create (and is this possible) regexp parsing pascal-like function declaration with body ?
I've created some regexp
function\s+(\w+)(\(((((var\s*)?(\w+)(\s*\,+\s*)?)+?\s*\:\s*(\w+)\s*\;?\s*?)\s*)+\))?\s*\:\s*(\w+)
which can pool only functions prototypes (it works only if there is no comments, so i clear comments before parsing ) and i have no idea how to change it to make it pool functions with bodies. The problem is there are can be many of "begin - end" blocks, so it is hard to find functions ending
Sorry, but you are using the wrong tool. Programming languages have a context-free structure that regular expressions simply cannot recognize reliably. Properly nested parentheses like { () [] } { } are an example for such a context-free structure for which you cannot find a regular expression that checks the proper nesting.
To solve the problem, you could use regular expression to break down program code into a stream of tokens and then use a (manually coded) top-down parser to check the structure of this token stream. To learn about this, consult any book about compiler design. Scanning (breaking into tokens) and parsing (checking structure) are always the first chapters. The Wikipedia entry for a top-down parser provides an example.