Is it possible to get the syntax tree with liblua?
I need the AST of lua code, but I can't depend on ANTLR4, so I'm looking for a self contained solution. Since my host app already embeds lua, liblua would be perfect.
If not with liblua, what other options are there for parsing Lua in C++?
The built-in Lua parser is a one-pass compiler which directly generates Lua VM instructions. No AST is created, and it would be more complicated to decompile the VM code into an AST than to create a parser using whatever parser generator you feel comfortable with.
For example, Bison produces perfectly acceptable C++ code without requiring any run-time. It has a C++ API but it is also possible to use the C template and compile the result as a C++ program.
Related
I want to use ANTLR 3.5 in a C++ program of mine, but I'm running into trouble with how to actually make use of the parser and lexer that gets generated. Using a grammar similar to the one here, I can do something like SimpleCalcParser.expr(). However, if I want to do something more complex (e.g. parse a language that doesn't just result in a single value, but something more imperative or declarative), it seems quite difficult with the C++ target. As far as I can tell, there is no ability to output an AST or template. Without this, I'm not sure how you can do anything other than just determine if your input parsed correctly or not. Does anyone know how to do this with the C++ target, or would using the C target to generate an AST and using that in C++ be a better option?
some time ago I created some patches for C++ target(see github). There should be AST generation added(but not 100% complete) and also added some tests, which you can use as an example. With current ATLR 3.5 each rule has to return some complex class as a value. And you have to "build" tree manually by using rule actions.
We have a CORBA implementation that autogenerates Java and C++ stubs for us. Because the CORBA-generated code is difficult to work with, we need to write wrappers/helpers around the CORBA code. So we have a 2-step code generation process (yes, I know this is bad):
CORBA IDL -> annoying CORBA-generated code -> useful wrappers/helper functions
Using Java's reflection, I can inspect the CORBA-generated code and use that to generate additional code. However, because C++ doesn't have reflection, I am not sure how to do this on the C++ side. Should I use a C++ parser? C++ templates?
TLDR: How to generate C++ code using generated C++ code as input?
Have you considered to take a step back and use the IDL as source for a custom code generator? Probably you have some wrapper code that hides things like duplicate, var, ptr, etc. We have a Ruby based CORBA IDL compiler that currently generates Ruby and C++ code. That could be extended with a customer generator, see https://www.remedy.nl for RIDL and R2CORBA.
Another option would be to check out the IDL to C++11 language mapping, more details on https://www.taox11.org. This new language mapping is much easier to use and uses standard types and STL containers to work with.
GCC XML could help in recovering the interface.
I'm using it to write a Prolog foreign interface for OpenGL and Horde3D rendering engine.
The interfaces I'm interested to are limited to C, but GCC XML handles C++ as well.
GCC XML parse source code interface and emits and XML AST. Then with an XML library it's fairly easy extract requested info. A nuance it's the lose of macro' symbols: AFAIK just the values survive to the parse. As an example, here (part of ) the Prolog code used to generate the FLI:
make_funcs(NameChange, Xml, FileName, Id) :-
index_id(Xml, Indexed),
findall(Name:Returns:ArgTypes,
(xpath(Xml, //'Function'(#file = Id, #name = Name, #returns = ReturnsId), Function),
typeid_indexed(Indexed, ReturnsId, Returns),
findall(Arg:Type, (xpath(Function, //'Argument'(#name = Arg, #type = TypeId), _),
typeid_indexed(Indexed, TypeId, Type)), ArgTypes)
),
AllFuncs),
length(AllFuncs, LAllFuncs),
writeln(FileName:LAllFuncs),
fat('prolog/h3dplfi/~s.cpp', [FileName], Cpp),
open(Cpp, write, Stream),
maplist(\X^((X = K-A -> true ; K = X, A = []), format(Stream, K, A), nl(Stream)),
['#include "swi-uty.h"',
'#include <~#>'-[call(NameChange, FileName)]
]),
forall(member(F, AllFuncs), make_func(Stream, F)),
close(Stream).
xpath (you guess it) it's the SWI-Prolog library that make analysis simpler...
If you want to reliably process C++ source code, you need a program transformation tool that understands C++ syntax and semantics, can parse C++ code, transform the parsed representation, and regenerate valid C++ code (including the original comments). Such a tool provides in effect arbitrary metaprogramming by operating outside the language, so it is not limited by the "reflection" or "metaprogramming" facilities built into the language.
Our DMS Software Reengineering Toolkit with its C++ Front End can do this.
It has been used on a number of C++ automated transformation tasks, both (accidentally) related to CORBA-based activities. The first included reshaping interfaces for a proprietary distributed system into CORBA-compatible facets. The second reshaped a large CORBA-based application in the face of IDL changes; such changes in effect cause the code to be moved around and causes signature changes. You can find technical papers at the web site that describe the first activity; the second was done for a major defense contractor.
Take a look at Clang compiler, aside from being a standalone compiler it is also intended to be used as an library in situations like the one you describe. It will provide you with parse tree on which you could do your analysis and transformations
I'm looking to get an AST for C++ that I can then parse with an external program. What programs are out there that are good for generating an AST for C++? I don't care what language it is implemented in or the output format (so long as it is readily parseable).
My overall goal is to transform a C++ unit test bed to its corresponding C# wrapper test bed.
You can use clang and especially libclang to parse C++ code. It's a very high quality, hand written library for lexing, parsing and compiling C++ code but it can also generate an AST.
Clang also supports C, Objective-C and Objective-C++. Clang itself is written in C++.
Actually, GCC will emit the AST at any stage in the pipeline that interests you, including the GENERIC and GIMPLE forms. Check out the (plethora of) command-line switches begining with -fdump- — e.g. -fdump-tree-original-raw
This is one of the easier (…) ways to work, as you can use it on arbitrary code; just pass the appropriate CFLAGS or CXXFLAGS into most Makefiles:
make CXXFLAGS=-fdump-tree-original-raw all
… and you get “the works.”
Updated: Saw this neat little graphing system based on GCC AST's while checking my flag name :-) Google FTW.
http://digitocero.com/en/blog/exporting-and-visualizing-gccs-abstract-syntax-tree-ast
Our C++ Front End, built on top of our DMS Software Reengineering Toolkit can parse a variety of C++ dialects (including C++11 and ObjectiveC) and export that AST as an XML document with a command line switch. See example ASTs produced by this front end.
As a practical matter, you will need more than the AST; you can't really do much with C++ (or any other modern language) without an understanding of the meaning and scope of each identifier. For C++, meaning/scope are particularly ugly. The DMS C++ front end handles all of that; it can build full symbol tables associating identifers to explicit C++ types. That information isn't dumpable in XML with a command line switch, but it is "technically easy" to code logic in DMS to walk the symbol table and spit out XML. (there is an option to dump this information, just not in XML format).
I caution you against the idea of manipulating (or even just analyzing) the XML. First, XSLT isn't a particularly good way to understand the meaning of the ASTs, let alone transform the AST, because the ASTs represent context sensitive language structures (that's why you want [nee MUST HAVE] the symbol table). You can read the XML into a dom-like tree if you like and write your own procedural code to manipulate it. But source-to-source transformations are an easier way; you can write your transformations using C++ notation rather than buckets of code goo climbing over a tree data structure.
You'll have another problem: how to generate valid C++ code from the transformed XML. If you don't mind spitting out raw text, you can solve this problem in purely ad hoc ways, at the price of having no gaurantee other than sweat that generated code is syntactically valid. If you want to generate a C++ representation of your final result as an AST, and regenerate valid text from that, you'll need a prettyprinter, which are not technically hard but still a lot of work to build especially for a language as big as C++.
Finally, the reason that tools like DMS exist is to provide the vast amount of infrastructure it takes to process/manipulate complex structure such as C++ ASTs. (parse, analyse, transform, prettyprint). You can try to replicate all this machinery yourself, but this is usually a poor time/cost/productivity tradeoff. The claim is it is best to stay within the tool ecosystem rather than escape it and build bad versions of it yourself. If you haven't done this before, you'll find this out painfully.
FWIW, DMS has been used to carry out massive analysis and transformations on C++ source code. See Publications on DMS and check the papers by Akers on "Re-engineering C++ Component Models".
Clang is based on the same kind of philosophy; there's an ecosystem of tools.
YMMV, but I'd be surprised.
I want to programmatically parse and edit C++ source files. I need to change/add code in certain sections of code (i.e. in functions, class blocks, etc). I would also (preferably) be able to get comments as well.
Part of what I want to do can be explained by the following piece of code:
CPlusPlusSourceParser cp = new CPlusPlusSourceParser(“x.cpp”); // Create C++ Source Parser Object
CPlusPlusSourceFunction[] funcs = cp.getFunctions(); // Get all the functions
for (int i = 0; i < funcs.length; i++) { // Loop through all functions
funcs[i].append(/* … code I want to append …*/); // Append some code to function
}
cp.save(); // Save new source
cp.close(); // Close file
How can I do that?
I’d like to be able to do this preferably in Java, C++, Perl, Python or C#. However, I am open to other language API’s.
This is similar to AST from C code
If your comfortable with Java antlr can easily parser your code into an abstract syntax tree, and then apply transformation to that tree. A default AST transform is to simply print out the original source.
You can use any parser generator tool to generate a c++ parser for you, but first you have to get the CFG (context free grammar) for C++ , check Antlr
Edit:
Also Antlr supports a lot of target languages
You need a working grammar and parser for C++ which is, however, not too easy as this can't be constructed with most parser generators out there. But once you have a parser you can actually take the abstract syntax tree of the program and alter it in nearly any way you want.
The Mozilla project has a tool that does this.
The Clang static analyzer is now somewhat famous for doing a good job analyzing and rewriting C++. Stroustrup wrote a paper about a research project at Texas A&M, but I don't think it's been released.
A robust C++ parser is available with our DMS Software Reengineering Toolkit. It parses a variety of C++ dialects including ANSI, GNU 3/4, MSVS6 and MSVisual Studio 2005 and managaged C++.
It builds ASTs and symbol tables (the latter is way harder than you might think). You can navigate the ASTs, transform into different valid C++ programs, and regenerate code including comments.
In a C# -- or general .net -- approach, you might be able to get some use out of the C++/CLI CodeDOM provider -- having not used the C++ version of this type, I don't know how well it would handle code that is template heavy.
have a look at the doxygen project, its a open source project, to parse and document several programming languages, C++ included. I believe using this project's lexer will get you more than half the way
I have several functions in my program that look like this:
void foo(int x, int y)
Now I want my program to take a string that looks like:
foo(3, 5)
And execute the corresponding function. What's the most straightforward way to implement this?
When I say straightforward, I mean reasonably extensible and elegant, but it shouldn't take too long to code up.
Edit:
While using a real scripting language would of course solve my problem, I'd still like to know if there is a quick way to implement this in pure C++.
You can embed Python fairly simply, and that would give you a really powerful, extensible way to script your program. You can use the following to easily (more or less) expose your C++ code to Python:
Boost Python
SWIG
I personally use Boost Python and I'm happy with it, but it is slow to compile and can be difficult to debug.
You could take a look at Lua.
I'd also go for the scripting language answer.
Using pure C++, I would probably use a parser generator, which will will get the token and grammar rules, and will give me C code that exactly can parse the given function call language, and provides me with an syntax tree of that call. flex can be used to tokenize an input, and bison can be used to parse the tokens and transform them into an syntax tree. Alternatively to that approach, Boost Spirit can be used to parse the function call language too. I have never used any of these tools, but have worked on programs that use them, thus I somewhat know what I would use in case I had to solve that problem.
For very simple cases, you could change your syntax to this:
func_name arg1, arg2
Then you can use:
std::istringstream str(line);
std::string fun_name; str >> fun_name;
map[fun_name](tokenize_args(str));
The map would be a
std::map<std::string, boost::function<void(std::vector<std::string>)> > map;
Which would be populated with the functions at the start of your program. tokenize_args would just separate the arguments, and return a vector of them as strings. Of course, this is very primitive, but i think it's reasonable if all you want is some way to call a function (of course, if you want really script support, this approach won't suffice).
As Daniel said:
Script languages like Lua and Python would be the most used script languages for binding together c++ libraries.
You will have to add a script interface to your c++ application. The buildup of this interface obviously depends on what script language you chose.
CERN provides CINT, a C/C++ interpreter that can be embedded within your application to provide scripting capabilities.
If you only wish to call a function by literal name, you could use linker-specific functions.
On POSIX-compliant operating systems (like Linux), you can use dlopen() and dlsym(). You simply parse the input string and figure out the function name and arguments. Then you can ask the linker to find the function by name using dlsym().
On Windows however, these functions aren't available (unless there's some POSIX environment around, like Cygwin). But you can use the Windows API.
You can take a look here for details on these things: http://en.wikipedia.org/wiki/Dynamic_loading
C++ Reflection [2] by Fabio Lombardelli provides full re-flection for C++ through template metaprogramming tech-niques. While it is fully compliant with the C++ standards,it requires the programmer to annotate the classes in order forthem to be reflective
http://cppreflect.sourceforge.net/
otherwise you'd want a function pointer hash table i think
Does your system have to "take a string"? You could expose COM (or CORBA, or whatever) interfaces on your application and have whatever is generating these commands call into your application directly.