Use clang as a library to parse OpenCL code extended with some C++ elements - c++

I am currently working on a Source-to-source compiler that transforms code wirtten in an OpenCL superset to "ordinary" OpenCL. I would really like to use clang as a library to parse and analyze the source code. Especially, I really need all the available type information and I would like to have an AST to make use of clang's Rewrite capabilities.
Fortunately, the OpenCL superset that needs to be parsed is really a "mixture" between OpenCL and C++, i.e. the code is basically OpenCL extended with some C++ stuff. In detail, there are possibly template annotations before a function definition and there may be structs containing methods (including operator definitions).
I was hoping that I can use clang to parse this language, since the clang parser is capable of parsing all these constructs. However, I am not sure how (if possible) to tell clang to parse OpenCL and C++ constructs at once. If possible, I really want to avoid touching the clang code base, but I would prefer using clang as a library instead. Maybe it is possible to setup an appropriate instance of clang's LangOptions class that tells clang to parse all these constructs?
Any ideas on how to make clang parse this mixture between OpenCL and C++? Any help is appreciated, and thanks in advance!

You're trying to mix two different front ends, involving both parsing and name resolution.
I think you are in for a rough trip. The key problem is you are trying to glue together things that had no effort expended, to make them gluable. This usually leads to integration hell. You don't see people doing this with Fortran and C++ for the same reasons.
To start with, you'll discover you will have to define the semantics of how the C++ extensions interact with those of OpenCL. If you check out the C++ standard, you'll discover 600 pages of results from committee arguments on how C++ interacts with itself. So unless you can define a radically simple interaction, you'll have a tough time knowing what your mixed OpenCL/C++ program means.
Your second problem will be interleaving the Clang parsing machinery for C++ (AFAIK hand written code) with the Clang parsing machinery for OpenCL (don't know anything about it, but assumed it follows the C++ style). There's no obviously good reason to believe you can just pick and choose these to interleave easily. It may work out fine; just not a bet I'd care to make.
The next place you are likely to have trouble is in building an AST for the joint language. Maybe it is the case that Clang has defined AST nodes for both C++ and OpenCL in a way that easily composes to a joint Clang/OpenCL tree. Since the node types are chosen by hand, and there was no specific reason to design them to work together, it is also not obvious they will compose nicely.
Your last task, given a "valid" OpenCL/C++ tree, to transform it to OpenCL. How in fact will you expand a C++ template (or any general C++ code) to OpenCL code?
[Check my bio for another system, DMS, that might be a bit better for this task; it provides uniform infrastructure for multiple languages that would make some of this easier. Somewhat similar to what you are trying to do, we have used DMS to mix C++ with F90 and APL concepts for easy expression of vector operations in a prototype Vector C++, but we did not try to preserve F90 and APL syntax and semantics exactly for all the above reasons].
It isn't my purpose to rain on your parade; progress is made by the unreasonable man. Just be sure you understand how big a task you are taking on.

Related

Is there a C-like syntax scripting language interpreter for C++?

I've started long ago to work on a dynamic graph visualizer, editor and algorithm testing platform (graphs with nodes and arcs, not the other kinds).
For the algorithm testing platform i need to let the user write a script or call a script from a file, which will interact with the graph currently loaded. The visualizer would do things like light up nodes while they're being visited by the script algorithm, adding some artificial delay, in order to visualize the algorithm navigating and doing stuff.
Scripts would also be secondly used to add third party features that i could either make available as pre-existing scripts in the program folder OR just integrate inside the program in c++ once they're tested and working.
All my searches for an interpreter to embed in my program sent me to lua;
then i started handwriting my own recursive descent parser for my own C-like syntax scripting language (which i planned to use a subset of C++ grammar so that any code written in my scripting language can be copy-pasted in any C++ code.
It was an interesting crazy idea which i don't regret at all, I have scopes, functions, cycles, gotos, typesafe variables, expressions.
But now that i'm approaching the addition of classes, class methods, inheritance (some default classes would be necessary to interface scripts to the program), i realized it's going to take A LOT of time and effort. A bit too much for a personal project of an ungraduated student with exams to study for… but still i whish to complete this project.
The self-imposed requirement of the scripts being 100% compatible with C++ was all but necessary, it would have been just a little nice extra thing, which i can do without.
Now the question is, is there an alternative to lua with a c-like syntax that supports all i've already done plus classes and inheritance? (being able to add custom "classes" that interface scripts to the program is mandatory)
(i can't assume the user to have a full c++ compiler installed so i cant just compile their "script" at runtime as a dll to load and call it, although i whish i could)
Just-in-time compilation of C++
Parsing C++ is hard. Heck, parsing C is hard. It's difficult to get it right, and there are a lot of edge cases. Thankfully, there are a few libraries out there which can take code and even compile it for you.
libclang
libclang provides a lot of facilities for parsing c++. It's a good, clean library, and it'll parse anything the clang compiler itself will parse. This article here is a good starter
libclang provides a JIT compilation tool that allows you to write and compile C++ at runtime. See this blog post here for a overview of what it does and how to use it. It's very general, very powerful, and user-written code should be fast.
GCC also provides a library called libgccjit for just-in-time compilation during the runtime of a program. libgccjit is a C library, but there's also a C++ wrapper provided by the library maintainers. It can compile abstract syntax trees and link them at runtime, although it's still in Alpha mode.
cppast
If you don't want to use libclang, there's also a library under development called cppast, which is a C++ parser which will give you an abstract syntax tree representation of your c++ code. Unfortunately, it won't parse function bodies.
Other tools
If anyone knows any other libraries for compiling or interpreting C++ at runtime, I encourage them to update this post, or comment them so I can update it!
Here is something that lets you embed a C-like scripting language in your application (and a bunch of other cool things):
http://chaiscript.com/
There is lots of documentation:
https://codedocs.xyz/ChaiScript/ChaiScript/

Current state of the art for c++ parsers?

I understand that it's a very hard thing to do, what with #ifdef, #define, and templates, but what is the state of the art of c++ parsers (be it open source, or proprietary?).
I mean, for a university project I'm thinking of creating a tool for analysing c++ code bases, but it seems very hard to find a good parser for it.
Should I give up and settle for java parsers? Similarly, what's the state of the art for java parsers? What about c#?
Also, would ripping the parser part of g++ apart from it ever work for the purposes of code analysis, or is it too much effort trying to do so?
You're in luck! Clang just started being able to parse most c++ programs within the last few months: http://clang.llvm.org/ It's one of the few open source parsers actually able to parse most of C++. (Mostly just GCC and CLANG, I hear Oink(?) Can get pretty good sometimes) And it's built to be used as a library by IDEs and the like, even has architecture built to support code rewriting.
There are some proprietary parsers that get the job done, But none of them are really usable without source access.
Regarding ripping apart gcc, That's not very practical for code analysis depending on what you are trying to do, you could use the new plugin architecture to get some usable information out of it, however at a very early level in parsing, it does something called term folding, where the parser itself will optimize out things like "x = x" (A simplistic example) And other aspects of the compiler expects this to happen, so it's not trivial to remove. Thus making gcc nearly useless for anything resembling source rewriting.
For C++ you can use GCC with -fdump-translation-unit & friends option to get AST from it.
See: http://www.manpagez.com/man/1/g++/
If you can compile something by g++ then you can get tree from it.
The industry stardard C++ parser, widely used in compilers, in EDG's C++ front end. I have no experience with this; but I understand it handles a huge variety of C++ dialects. I understand you can get it free for research purposes.
The open source standard is the GCC compiler. I hear is it difficult to understand and modify.
There's CLANG as mentioned in other answers. I have no experience here. My understanding is that it is fairly sophisticated especially in terms of supporting analysis.
Our proprietary DMS Software Reengineering Toolkit has full C++ parser with full name and type resolution, preprocessor expansion (or retention, which the other tools will not do). The C++ front end handles several dialects of C++: ANSI, GCC, MS Visual Studio. As you might guess, I have a lot of experience with this one.
DMS/CppFrontEnd has been used to carry out program analyses as well as massive program source-to-source transformations on C++ code, enabled by DMS's pattern parser, which will parse any fragment of C++ code. I believe the other C++ front ends don't provide source-to-source transformations. With those you can likely hack at the ASTs procedurally, but this is pretty inconvenient because you have to know the precise AST structure and for C++ this is pretty complicated.
DMS also has full C, Java and COBOL front ends with name and type resolution as well as control and data flow analysis. It has parsers (but not name and type analysis) for many other langauges, including C#.
AFAIK, the other "C++ parsers" can't do this, sort of by definition. One can apply source-to-source transformations on any of these, or any mixture of these.
clang is worth looking into. it's fast and they provide apis to hook into their backend.
Xcode 4 uses clang for tasks such as parsing, error reporting/detection in some cases, auto-completion, and fix-its.

Lisp/Scheme DSEL in C++

I came across the following post on the boost mailing lists (emphasis mine):
hello all,
does anybody know of an existing spirit/lisp implimentation, and is there
any interest in developing such a project in open source?
None yet, AFAIK.
I'll be writing an example for Spirit2
to complement the tiny-C virtual
machine in there. What's equally
interesting though is that scheme (or
at least a subset of it) can be
implemented in pure c++. No parsing,
just pure DSEL in C++. Now, imagine a
parser that targets this DSEL (through
C++) -- a source to source translator.
Essentially, your scheme code will be
compiled into highly efficient C++.
Has anyone actually done this? I would be very interested in such a DSEL.
I wrote a Lisp-like language called Funky using Spirit in C++. An Open Source version is available at http://funky.vlinder.ca. It wouldn't take too much to turn that into a Lisp-like to C++ translator.
Actually, what it would take is a run-time support library to provide generic closure times and somesuch: if you want to turn the Lisp code into efficient C++, you will basically need C++ classes (functors, etc.) to do the heavy lifting once you get to run-time, so your Lisp-to-C++ translator would need to:
parse the Lisp
create an AST from the Lisp
transform the AST to optimize it, if possible (optimizations in Lisp are different from optimizations in C++, so if you want rally fast C++, you have to optimize the Lisp and let your C++ compiler optimize the generated C++)
generate the C++, for which you'd rely on your run-time support library for things like built-in functions, functor types, etc.
If you were to start from Funky, you'd already have the parse and the AST (though Funky doesn't optimize the AST), so you could go from there an create the run-time and generate the C++...
It wouldn't be overly complicated to write one from scratch either: Lisp grammar isn't that difficult, so most of the work would go into the AST and the run-time support.
If I weren't writing an object-oriented DSL right now, I might try my hand at this.
scheme to (readable) c++
http://www.suri.cs.okayama-u.ac.jp/servlets/APPLICATION.rkt
How about this
Not sure if this is what you want, but:
http://howtowriteaprogram.blogspot.com/2010/11/lisp-interpreter-in-90-lines-of-c.html
It looks like a start, at least.

Need to know about good C++ Reflection API (For RuntimeType Identification -RTTI and runtime calling)

I need a good C++ Reflection API (like a Microsoft API) which enables me to determine the types (class, struct, enum, int, float, double, etc) identified at runtime, declare them, and call methods on those types at runtime.
Regards,
Usman
If you are trying to get to a plugin-type architecture, the POCO Library at http://pocoproject.org has some pieces that might get you part of the way. It will allow you to load a .dll or .so at runtime and create the classes contained in it. But the calling code will still need a header file which describes an interface (or abstract base class) to be able to get the signatures of the methods.
C++ is an incredibly complex language. "Reflective" APIs weren't part of the language design and so basically it isn't there.
If you want general purpose "reflection" and "metaprogramming", you can get that by stepping outside the language and using a program transformation system (PTS). Such a tool for your purpose has to parse C++ (in more than one compilation unit at a time), provide you with access to all the language structures, let you reflect, that is, determine the type (or other properties) of any construct (e.g., variable, expression or other syntax construction) and enable you to apply arbitrary code modifications. Obviously, this won't happen at "runtime" (although I suppose you could shell out to such machinery if you insisted).
Our DMS Software Reengineering Toolkit with its C++ Front End has a proven track record at analyzing and transformating very large sets of C++ code. See the technical papers for some detailed use cases. I don't think the other tools at the Wikipedia site handle C++, although they have the right mindset.
Although it isn't really a PTS (no source-to-source transformations), Clang might work, too. I'm not sure (since I don't use it all), how it can collect type information and use it to drive transformations to the source code. Its clearly very good at using such information to do LLVM code generation.

Partially parse C++ for a domain-specific language

I would like to create a domain specific language as an augmented-C++ language. I will need mostly two types of contructs:
Top-level constructs for specialized types or declarations
In-code constructs, i.e. to add primitives to make functions calls or idiom easier
The language will be used for scientific computing purposes, and will ultimately be translated into plain C++. C++ has been chosen as it seems to offer a good compromise between: ease of use, efficiency and availability of a wide range of libraries.
A previous attempt using flex and bison failed due to the complexity of the C++ syntax. The existing parser can still fail on some constructs. So we want to start over, but on better bases.
Do you know about similar projects? And if you attempted to do so, what tools would you use? What would be the main pitfalls? Would you have recommendations in term of syntax?
There are many (clever) attempts to have domain specific languages within the C++ language.
It's usually called DSEL for Domain Specific Embedded Language. For example, you could look up the Boost.Spirit syntax, or Boost.rdb (in the boost vault).
Those are fully compliant C++ libraries which make use of C++ syntax.
If you want to hide some complexity, you might add in a few macros.
I would be happy to provide some examples if you gave us something to work with :)
You can try extending an open source Elsa C++ parser (it is now a part of a Mozilla's Pork project):
https://wiki.mozilla.org/Pork
The way to extend C++ is not to try to extend the language, which will be extremely difficult and probably break as new base compiler releases implement new features, but to write class libraries to support your problem domain. This has been what C++ programming has been all about since the language's inception.
If you really want to extend C++, you'll need a full C++ parser plus name and type resolution. As you've found out, this is pretty hard. Your best solution is to get an existing one and modify it.
Our DMS Software Reengineering Toolkit is an infrastructure for implementing langauge processors. It is
designed to support the construction of tools that parse languages, carry out transformations, and spit out the same language (with enhanced code) or a different language/dialect.
DMS has a full C++ Front End, that parses C++, builds abstract syntax trees and symbol tables (e.g., all that name and type resolution stuff).
The DMS/C++ front end is provided with DMS in source form, so that it can be customized to achieve the kind of effect you want. You'd define your DSL as an extension of the C++ front end, and then write transformations that convert your special constructs into "vanilla" C++ constructs, and then spit out compilable result.
DMS/C++ have been used for a wide variety of transformation tasks, including ones that involved extending C++ as you've described, and including tasks that carry out massive reorganizations of large C++ applications. (See the Publications at that website).
To solve you first bullet, maybe you can use C++0x new features "initializer lists", and "user defined litterals" avoiding the need for a new parser. They may help for the second bullet, too.