Is there any alternative to python ast module in Crystal? - crystal-lang

Does Crystal expose its internal parser as a standard library as python does with it's ast module? How to parse crystal source code and get the AST of it?

Yes! In fact it ships the entire compiler in the stdlib. So we can just access the parser to get an AST:
require "compiler/crystal/syntax/*"
root = Crystal::Parser.new(%(
class Foo
def hello
:world
end
end)).parse
The official docs do not include the Crystal::ASTNode and its subclasses, the ones you find the docs are the ones exposed the macro language and thus slightly differ. So you'll have to dive into the source code to see how to make use of the AST.

Related

Ignore missing headers with clang AST parser

I'm on Windows, using MSVC to compile my project, but I need clang for its neat AST parser, which allow me to write a little code generator.
Problem is, clang cannot parse MSVC headers (a very-well known and understandable problem).
I tried two options :
I include MSVC header folder, parsing the built-in headers included in my code will end-up leading to a fatal error at some point, preventing me from parsing the parts I want correctly.
What I did before is simply not provide any built-in headers and forward declare the types I needed. It worked fine and somehow it doesn't anymore with latest Clang. I don't really know if the parser policy on missing header changed, but it is causing complete failure every time something like <string> is included and not much get parsed.
I am using the python bindings (libclang), but I would consider switching to C/C++ API if there would be a solution there.
Is there anyway I can alter this behavior and make clang continue parsing even when some headers are not found ?
Use SetSuppressIncludeNotFoundError. Took me an hour to find! You can imagine how glad I was to find it!
https://clang.llvm.org/doxygen/classclang_1_1Preprocessor.html#ac7bafe67fc32e41460855b39d20ff6af
One way to ignore the errors due to missing headers is to set SetSuppressIncludeNotFoundError to true in your definition of ASTFrontendAction. An example for the same is given below.
{
public:
virtual std::unique_ptr<clang::ASTConsumer> CreateASTConsumer(
clang::CompilerInstance &Compiler, llvm::StringRef InFile)
{
Compiler.getPreprocessor().SetSuppressIncludeNotFoundError(true);
return std::unique_ptr<clang::ASTConsumer>(
new CustomASTConsumer(&Compiler.getASTContext()));
}
};
For a complete example using ASTFrontendAction, please visit at https://clang.llvm.org/docs/RAVFrontendAction.html
So you want to process C++ code that uses MS headers, and you want access to ASTs so that you can generate code. And Clang won't handle MS headers.
So Clang can't be the answer unless it gets a radical upgrade.
You asked for "any solution that can make this work".
Our DMS Software Reengineering Tookit with its C++14 Front End can do this.
DMS provides general parsing, AST construction/inspection/transformation/generation, and inverse parsing (conversion of ASTs back into compilable code), parameterized by language definitions.
The C++ front end provides a full C++14 parser, preprocessor handling, AST construction, and full name and type resolution. It has been tested with GCC and MS VS 2013 header files; we're testing with 2015 header files now.
(It also handles MS VS 2013 syntax, too).
It handles the tough parsing cases completely, including the C++ famous "most vexing parse". You can see parse trees at get human readable AST from c++ code.
DMS does not provide Python bindings, nor a direct C++ interface. Rather, it is a standalone tool designed to support the construction of custom tools (e.g., your "little code generator"). It has its own very extensive set of internal APIs, coded in metaprogramming language PARLANSE, which is LISP-like. Other aspects of DMS are managed by using DSLs for lexers, grammars, and transformations. See below.
A word of caution: any tool that can process C++ is gauranteed to be complex. DMS is correspondingly complex, and it takes a while to learn to use it, so you're not going to get instant answers. The good news here
is that some things are easier to do. Your code generation problem
is likely "read a skeleton file, and then replace key entries in it with problem specific code". If that's the case, a DMS tool with the following code (simplified for presentation here) will likely do the trick:
...
(= myAST (Registry:ParseFile (. filename) (. `CppVisualStudio2013') ...)
(Registry:ApplyTransforms myAST (. `MyTransforms.rsl'))
(Registry:PrettyPrint myAST (concat filename `.modified'))
...
with a transforms file MyTransforms.rsl containing source-to-source surface-syntax (e.g, C++ syntax) transformation rules of the conceptual form
rule rulename if_you_see THIS then replace_by ("-->") THAT
An actual C++ rule might look like (making this up because I don't
know your actual code generation goals)
rule replace_abstraction(s: STRING_LITERAL):
" abstraction_place_holder(\s) "
-> " my_DSL_library(\s,17); "
The ApplyTransforms call above will apply all the rules in this file until none apply any further.
Writing surface syntax transforms, where you can do it, is way easier than making calls on a procedure library (which, like Clang, DMS offers) that hack at the tree.
You can write more complex metaprograms using PARLANSE to apply some rules in one place, other rules someplace else, and you can mix source-to-source transforms with procedural transforms that hack directly at the tree if you want.
If you want more details on what transforms look like, ask and I'll provide a link.

clang libTooling: How to find which header an AST item came out of?

Examples found on the web for clang tools are always run on toy examples, which are usually all really trivial C programs.
I am building a tool which performs source-to-source transformations on C++ code, which is obviously a very, very challenging task, but clang is up to this task.
The issue I am facing now is that the AST that clang generates for any C++ code that utilizes the STL is enormous. For example I have some C++ code for which clang++ -ast-dump ... | wc -l is 67,018 lines of horrifying AST gobbledygook!
99% of this is standard library stuff, which I aim to ignore in my source-to-source metaprogramming task. So, to achieve this I want to simply filter out files. Suppose I want to look at only the class definitions in the headers of the project that I'm analyzing (and ignore all standard library headers's stuff), I will need to just figure out which header each of my CXXRecordDecl's came from!
Can this be done?
Edit: Hopefully this is a way to go about it. Trying this out now... The important bit is that it has to tell me the header that the decls came out of, not the cpp file corresponding to the translation unit.
In my experience so far, the "source" of some given AST node is best retrieved by using Locations. For example every node at least has a start location, and when you print this out it will contain the header file path.
Then it's possible to use this path to decide whether it is a system library or part of your application code that you still are interested in examining.
One route I'm looking at is to narrow matches with things like hasName() (as found here. For example:
recordDecl(hasName("MyBaseClass")) // etc.
However your comment above using -ast-dump is something I tried as well to get a lay of the land on my own CLang tool. I found this post to be extremely helpful. Armed with their suggestion, I used clang-check to filter to a specific class name and fed it my top-level CPP file. The output was a much more manageable few hundred lines representing the class declarations and definitions of interest.

parser generator that generates stand-alone C++ code

Is there a LALR parser generator that produces stand-alone C++ code? I am hoping that it would generate two files named something like "Parser.cpp" and "Parser.hpp," and the generated parser is implemented in a single class (that I can wrap in whatever namespace) that I can use for my parsing needs.
I want to use it for fun (i.e. small personal projects), and I'd like the output to be stand-alone (without any headers) so that I know I can compile it wherever I have a C++ compiler.
The search so far:
I've looked at flex/bison, but AFAIK they both require special headers and libraries. I've also looked at ANTLR a little bit, but it is not obvious to me that it can generate stand-alone C++ code. If someone can confirm that it can, then I might look more into it.
GOLD Parser (Bart Kiers mentioned the list on Wikipedia) has support for C and C++ languages. It does not generate a completely self-contained C/C++ source code file. All it does is the generation of Lexer/Parser tables which can be consumed by the "parsing engine".
To accomplish your task (or something similar) I did the following:
Prepare your LALR grammar in Gold's format
Generate parsing tables (one binary file)
Use an old trick to convert the binary file into a header file like
unsigned char ParseTable[] = { ... };
Modify the loader from the "parsing engine" sources (or use the C version which supports in-memory loading, as I remember)
Combine the sources for the GPEngine (if it is a C++ version) into the .h/.cpp pair.
Append the ParseTable to .cpp
Sure, it's not that straightforward, but all the steps can in principle be done within a single "combine" script which can be used with a number of grammars.
I guess the major drawback is the fact that GOLD is closed-source and windows-only (it means that to produce the parsing tables you have to use Windows machine).
ANTLR can generate C++ code although IMHO I find the support for C++ is a bit weak, it is more like C code. Still it is a good environment to work with ANTLRWorks giving you a graphical representation of your syntax tree.
The output from flex+bison consists of two .c files and one .h file. These are completely stand-alone, in that they are all you need to compile into your application to make use of the parser. There are no additional libraries or headers needed (beside the standard C ones).
Unless I've misunderstood your requirements, you definitely can do what you want with flex+bison.

Python code to parse and inspect c++

Is there a library for Python that will allow me to parse c++ code?
For example, let's say I want to parse some c++ code and find the names of all classes and their member functions/variables.
I can think of a few ways to hack it together using regular expressions, but if there is an existing library it would be more helpful.
In the past I've used for such purposes gccxml (a C++ parser that emits easily-parseable XML) -- I hacked up my own Python interfaces to it, but now there's a pygccxml which should package that up nicely for you.
Parsing C++ accurately is light-years from something you can do with a regular expression.
You need a full C++ parser, and they're pretty hard to build. I've been involved in building one over several years, and track who is doing it; I don't know of any being attempted in Python.
The one I work on is DMS C++ Front End.
It provides not only parsing, but full name and type resolution. After parsing, you can basically extract detailed information about the code at whatever level of detail you like, including arbittary details about function content.
You might consider using GCCXML, which does contain a parser, and will produce, I believe, the names of all classes, functions, and top-level variables. GCCXML won't give you any information about what's inside a function.
This is a little outside your question's scope perhaps... but depending on what you're trying to achieve, perhaps Exuberant Ctags is worth looking at.
Have not tried, but using the Python bindings from LLVM's Clang parser may work; see here.
How about pyparsing?

Programmatically parse and edit C++ Source Files

I want to programmatically parse and edit C++ source files. I need to change/add code in certain sections of code (i.e. in functions, class blocks, etc). I would also (preferably) be able to get comments as well.
Part of what I want to do can be explained by the following piece of code:
CPlusPlusSourceParser cp = new CPlusPlusSourceParser(“x.cpp”); // Create C++ Source Parser Object
CPlusPlusSourceFunction[] funcs = cp.getFunctions(); // Get all the functions
for (int i = 0; i &lt funcs.length; i++) { // Loop through all functions
funcs[i].append(/* … code I want to append …*/); // Append some code to function
}
cp.save(); // Save new source
cp.close(); // Close file
How can I do that?
I’d like to be able to do this preferably in Java, C++, Perl, Python or C#. However, I am open to other language API’s.
This is similar to AST from C code
If your comfortable with Java antlr can easily parser your code into an abstract syntax tree, and then apply transformation to that tree. A default AST transform is to simply print out the original source.
You can use any parser generator tool to generate a c++ parser for you, but first you have to get the CFG (context free grammar) for C++ , check Antlr
Edit:
Also Antlr supports a lot of target languages
You need a working grammar and parser for C++ which is, however, not too easy as this can't be constructed with most parser generators out there. But once you have a parser you can actually take the abstract syntax tree of the program and alter it in nearly any way you want.
The Mozilla project has a tool that does this.
The Clang static analyzer is now somewhat famous for doing a good job analyzing and rewriting C++. Stroustrup wrote a paper about a research project at Texas A&M, but I don't think it's been released.
A robust C++ parser is available with our DMS Software Reengineering Toolkit. It parses a variety of C++ dialects including ANSI, GNU 3/4, MSVS6 and MSVisual Studio 2005 and managaged C++.
It builds ASTs and symbol tables (the latter is way harder than you might think). You can navigate the ASTs, transform into different valid C++ programs, and regenerate code including comments.
In a C# -- or general .net -- approach, you might be able to get some use out of the C++/CLI CodeDOM provider -- having not used the C++ version of this type, I don't know how well it would handle code that is template heavy.
have a look at the doxygen project, its a open source project, to parse and document several programming languages, C++ included. I believe using this project's lexer will get you more than half the way