Source-to-Source transformation with clang (state of the art) - c++

How is the state of the art in source-to-source transformation with clang?
I followed almost every resource on the Web and I can achieve to do source rewrites (Rewriter) via a clang plugin, but the final binary is not updated (the CodeGen is the main activity, and it is compiled regardless of what I have modified in my plugin, even using AddBeforeMainAction in the getActionType).
I have seen some documents regarding libTooling and how to create an independent program that uses clang as library, but my purpose it is to create a plugin (FrontendPluginRegistry::Add<>, something "easy" to plug to a non-custom clang binary) and achieve source-to-source modifications (transparently to the users, avoiding overwriting their source files).
Edit: In case it is not clear:
I need something like a "plugin" to extend clang in an easy way. I need something that is "integrated" in the compiling process. Why? because I need to modify the source code during the compilation phase, inject new code, modify the source code from the user in one step (I don't want to create a tool to parse the user source code and then compile the output files). Also, I would like to distribute my code (plugin) to allow users to use it by themselves.
It is mandatory that it is during the compilation phase of clang (clang $FLAGS $PLUGIN $ETC -o program source_files...).

the CodeGen is the main activity, and it is compiled regardless of what I have modified in my plugin
Yes, this is because Clang AST is designed to be immutable. You can't change it after parsing.
So state of the art s2s transformation in Clang looks like that:
Parse C++ source code to AST
Apply text replacements to original source code, generate new source code
Parse new source code to create new AST
You can do all steps "in memory", so end user won't notice.
Update:
I have never written clang plugins myself. But here is what I've noticed:
If you run Clang fronted to actually generate object code:
clang -cc1 -emit-obj main.c
It will run EmitObjAction. EmitObjAction is a FrontEnd Action, so it will parse input source and run codegen. So if you run other FrontEnd actions in parallel, they will have no effect on EmitObjAction. Because each FrontEnd action parses original input source code.
What you can do, is to replace EmitObjAction with your own fork that will do as much of re-parsing as you want.
If you set PluginASTAction::ActionType to ReplaceAction, it should replace built-in Codegen action by your own supplied from plugin.

Related

How to generate files using dart source_gen to a different directory

This issue describes the concept
https://github.com/dart-lang/source_gen/issues/272
To summarize:
I am using source_gen to generate some dart code.
I am using json_serializable on the generated dart code.
I wish to output all of the results to a source directory adjacent or below my target source.
The desired directory structure
src
feature_a
model.dart
gen
model.g.dart
model.g.g.dart
feature_b
...
I have considered building to cache however it seems json_serializable doesn't support this and even if it did I don't know if its even possible to run a builder on files in the cache.
I've also considered an aggregated builder that is mentioned here.
Generate one file for a list of parsed files using source_gen in dart
But json_serializable is still an issue and the source_gen version in that post is super old and doesn't describe the solution well.
This is not possible with build_runner. The issue to follow is https://github.com/dart-lang/build/issues/1689
Note that this doesn't help much with builders that you don't author, and wouldn't work with things like SharedPartBuilder.

How to visit a serialized clang abstract syntax tree (AST)

I've been able to implement an ASTFrontendAction to create an ASTConsumer, which uses a RescursiveASTVisitor to traverse a translation unit decl, thereby visiting all the nodes of an AST for a given source file. I do this by implementing a ToolAction which is passed to ClangTool::run(ToolAction *action). This tool overrides the ToolAction::runInvocation member function to do some processing between each call to my ASTFrontendAction. So far so good, everything is working as expected, and my custom clang tool is helping me better explore a rather large, over 15 years old code base.
However, every time I want to run my tool, I need to do a full blown parse of the ASTs. As I mentioned, this is a rather large code base, so it takes a while to do a single run. I have gathered, by looking through the code, that it is possible to create and traverse an AST from a saved file, rather than perform the parse. Googling around confirmed that it's possible to save an AST, and by looking at the ClangTool and ASTUnit API, it seems it's pretty easy to do.
While it seems straightforward to save the AST, I have no idea how to make use of a saved AST when running my custom clang tool. Looking at the code path for running a tool, I see there is a point where the AST is created either by parsing a source file or by reading it from a file. What I would like to do is have all my ASTs available in a file(s), so that each run of my tool will create the ASTs from a file, and not need to perform a full parse (which, I assume would be much faster).
Could someone please help? Thanks in advance!
This worked for me:
clang::CompilerInstance CI;
CI.createDiagnostics();
std::shared_ptr<clang::TargetOptions> TO = std::make_shared<clang::TargetOptions>();
TO->Triple = "x86_64-pc-win32"; // see clang -v
CI.setTarget(clang::TargetInfo::CreateTargetInfo(CI.getDiagnostics(), TO));
std::unique_ptr<ASTUnit> ast = ASTUnit::LoadFromASTFile("TheAstFile.ast",
CI.getPCHContainerReader(),
ASTUnit::LoadEverything,
&CI.getDiagnostics(),
CI.getFileSystemOpts());
MyAstConsumer consumer(ast->getASTContext());
consumer.HandleTranslationUnit(ast->getASTContext());

Antlr4 C++ target

We're starting a project where we will need to parse python source files in a C++ application. I've used Antlr2 a while back to generate a few compilers, but this is the first time I'm using Antlr4.
It looks like the c++ antlr4 target is fairly active at https://github.com/antlr/antlr4-cpp
So, my question is basically what is the status of the Antlr4 C++ target, is it ready to start being used? To use the C++ target, what just grab the Antlr4 source, and copy the Antlr4-cpp into this tree and build?
Note, I don't need something that's absolutely stable and guaranteed never to change, just something thats basically stable enough to start being used, if there are small/moderate API changes in the future, thats perfectly fine, I understand that it looks fairly early.
If the antlr4-c++ target is NOT really ready, what parser generator would you recommend for generating a C++ target python parser?
thanks
The ANTLR4 C++ target is now ready for use: https://soft-gems.net/the-antlr4-c-target-is-here/. Only needs some minor organizational stuff and must be merged to main repo.
This repository has the latest source code for the ANTLR 4 C++ target.
https://github.com/antlr/antlr4-cpp
Here is a good discussion about the status of the target.
https://groups.google.com/forum/#!topic/antlr-discussion/HV2QpwwjtLg

Compatibility issue with old lex-yacc code in new flex-bison

I am on a migration project to move a C++ application from HP-UX to redhad 6.4 server. Now there is a parser application written using lex-yacc, which works fine in HP-UX. Now once we moved the lex specification file (l file) and yacc specification file (y file) to the RHEL 6.4 server, we compiled the code into new system without much change. But the generated parser is not working, everytime it is giving some syntax error with same input file which is correctly parsed in HP-UX. Now as per some reference material on lex and flex incompatibility, there are below points I see in the l file -
It has redefined input, unput and output methods.
The yylineno variable is initialized, and incremented in the redifined input method when '\n' character is found.
The data in lex is read from standard input cin, which looks to be in scanner mode.
How can I find out the possible incompatibilities and remedies for this issue? And is there any way other than using gdb to debug the parser?
Both flex and yacc/bison have useful trace features which can aid in debugging grammars.
For flex, simply regenerate the scanner with the -d option, which will cause a trace line to be written to stderr every time a pattern is matched (whether or not it generates a token). I'm not sure how line number tracking by the debugging option will work with your program's explicit yylineno manipulation, but I guess that is just cosmetic. (Personally, unless you have a good reason not to, I'd just let flex track line numbers.)
For bison, you need to both include tracing code in the parser, and enable tracing in the executable. You do the former with the -t command-line option, and the latter by assigning a non-zero value to the global variable yydebug. See the bison manual for more details and options.
These options may or may not work with the HPUX tools, but it would be worth trying because that will give you two sets of traces which you can compare.
You don't want to debug the generated code, you want to debug the parser (lex/yacc code).
I would first verify the lexer is returning the same stream of tokens on both platforms.
Then reduce the problem. You know the line of input that the syntax error occurs on. Create a stripped down parser that supports parsing the contents of that line, and if you can't figure out what is going on from that, post the reduced code.

generate C/C++ command line argument parsing code from XML (or similar)

Is there a tool that generates C/C++ source code from XML (or something similar) to create command line argument parsing functionality?
Now a longer explanation of the question:
I have up til now used gengetopt for command line argument parsing. It is a nice tool that generates C source code from its own configuration format (a text file). For instance the gengetopt configuration line
option "max-threads" m "max number of threads" int default="1" optional
among other things generates a variable
int max_threads_arg;
that I later can use.
But gengetopt doesn't provide me with this functionality:
A way to generate Unix man pages from the gengetopt configuration format
A way to generate DocBook or HTML documentation from the gengetopt configuration format
A way to reuse C/C++ source code and to reuse gengetopt configuration lines when I have multiple programs that share some common command line options
Of course gengetopt can provide me with a documentation text by running
command --help
but I am searching for marked up documentation (e.g. HTML, DocBook, Unix man pages).
Do you know if there is any C/C++ command line argument tool/library with a liberal open source license that would suite my needs?
I guess that such a tool would use XML to specify the command line arguments. That would make it easy to generate documentation in different formats (e.g. man pages). The XML file should only be needed at build time to generate the C/C++ source code.
I know it is possible to use some other command line argument parsing library to read a configuration file in XML at runtime but I am looking for a tool that generate C/C++ source code from XML (or something similar) at build time.
Update 1
I would like to do as much as possible of the computations at compile time and as less as possible at run time. So I would like to avoid libraries that give you a map of the command line options, like for instance boost::program_options::variables_map ( tutorial ).
I other words, I prefer args_info.iterations_arg to vm["iterations"].as<int>()
User tsug303 suggested the library TCLAP. It looks quite nice. It would fit my needs to divide the options into groups so that I could reuse code when multiple programs share some common options. Although it doesn't generate out the source code from a configuration file format in XML, I almost marked that answer as the accepted answer.
But none of the suggested libraries fullfilled all of my requirements so I started thinking about writing my own library. A sketch: A new tool that would take as input a custom XML format and that would generate both C++ code and an XML schema. Some other C++ code is generated from the XML schema with the tool CodeSynthesis XSD. The two chunks of C++ code are combined into a library. One extra benefit is that we get an XML Schema for the command line options and that we get a way to serialize all of them into a binary format (in CDR format generated from CodeSynthesis XSD). I will see if I get the time to write such a library. Better of course is to find a libraray that has already been implemented.
Today I read about user Nore's suggested alternative. It looks promising and I will be eager to try it out when the planned C++ code generation has been implemented. The suggestion from Nore looks to be the closest thing to what I have been looking for.
Maybe this TCLAP library would fit your needs ?
May I suggest you look at this project. It is something I am currently working on: A XSD Schema to describe command line arguments in XML. I made XSLT transformations to create bash and Python code, XUL frontend interface and HTML documentation.
Unfortunately, I do not generate C/C++ code yet (it is planed).
Edit: a first working version of the C parser is now available. Hope it helps
I will add yet another project called protoargs. It generates C++ argument parser code out of protobuf proto file, using cxxopts.
Unfortunately it does not satisfy all author needs. No documentation generated. no compile time computation. However someone may find it useful.
UPD: As mentioned in comments, I must specify that this is my own project