I've read this link but still don't fully understand what's the difference between TraverseDecl and VisitDecl (and their use case) http://clang.llvm.org/doxygen/classclang_1_1RecursiveASTVisitor.html
Which method should I be overriding when writing my RecursiveASTVisitor?
TraverseDecl tells the frontend library's ASTConsumer to visit declarations recursively from the AST. Then VisitDecl is called where you can extract the relevant information.
Follow these two links for more details and a simple checker example:
http://clang.llvm.org/docs/RAVFrontendAction.html
How to traverse clang AST manually ?
Related
Is there any document on the list of Analysis and Transform passes available for use in the AnalysisUsage::addRequired<> and Pass::geAnalysis<> functions?
I can get a list of passes in http://llvm.org/docs/Passes.html, but it only shows the command line names for the passes. How can I know the underlying pass classes?
Not really, no. Just look at the source. The header files in include/llvm/Analysis/ and include/llvm/Transforms/ will tell you everything you need to know.
Moreover, grepping over the source for getAnalysis< will tell you which passes are used as analyses inside the LLVM source code.
How does std.conv.to!string(enum.member) work? How is it possible that a function takes an enum member and returns its name? Does it use a compiler extension or something similar? It's a bit usual to me since I came from C/C++ world.
What it does is use compile time reflection on the enum type to get a list of members (the names as strings) and their values. It constructs a switch statement out of this information for a fast lookup to get the name from a value. to!SomeEnum("a_string") uses the same principle, just in the other direction.
The compile time reflection info is accessed with __traits(allMembers, TheEnumType), which returns a list of strings that can be looped over to build the switch statement. Then __traits(getMember, TheEnumType, memberName) is used to fetch the body.
Traits can be seen more of here: http://dlang.org/traits.html#allMembers
That allMembers one works on many types, not just classes as seen in the example, but also structs, enums, and more, even modules.
The phobos source code has some examples like EnumMembers in std.traits: https://github.com/D-Programming-Language/phobos/blob/master/std/traits.d#L3360
though the phobos source is kinda hard to read, but on line 3399, at the bottom of that function, you can see it using __traits(allMembers) as its data source. std.conv.to is implemented in terms of many std.traits functions.
You can also check out the sample chapter tab to get the Reflection chapter out of my D cookbook which discusses this stuff too:
http://www.packtpub.com/discover-advantages-of-programming-in-d-cookbook/book
The final example in that chapter shows how to use several of the reflection capabilities to build a little function dispatcher based on strings. The following chapter (not available for free though) shows how to build a switch out of it for better efficiency too. It's actually pretty easy: just put the case statements inside a foreach over the compile time data and the D compiler will unroll then optimize the lookup table for you!
I need to parse a C++ class file (.h) and extract the following informations:
Function names
Return types
List of parameter types of each function
Assume that there is a special tag using which I can recognize if I need to parse a function or not.
For eg.
#include <someHeader>
class Test
{
public:
Test();
void fun1();
// *Expose* //
void fun2();
};
So I need to parse only fun2().
I read the basic grammar here, but found it too complex to comprehend.
Q1. I can't make out how complex this task is. Can someone provide a simpler grammar for a function declaration to perform this parsing?
Q2. Is my approach right or should I consider using some library rather than reinventing?
Edit: Just to clarify, I don't have problem parsing, problem is more of understanding the grammar I need to parse.
A C++ header may include arbitrary C++ code. Hence, parsing the header might be as hard as parsing all kinds of C++ code.
Your task becomes easier, if you can make certain assumptions about your header file. For instance, if you always have an EXPOSE-tag in front of your function and the functions are always on a single line, you could first grep for those lines:
grep -A1 EXPOSE <files>
And then you could apply a regular expression to filter out the information you need.
Nevertheless, I'd recommend using existing tools. This seems to be a tutorial on how to do it with clang and Python.
GCC XML is an open source tool that emits the AST (Abstract Syntax Tree). See this other answer where I posted about the usage I made of it.
You should consider to use only if you are proficient (or akin to learn) with an XML analyzer for inspecting the AST. It's a fairly complex structure...
You will need anyway to 'grep' for the comments identifying your required snippets, as comments are lost in output XML.
IF you are doing this just for documentation doxygen could be a good bet.
Either way it may give you some pointers as to how to do this.
I would like to write a small tool that takes a C++ program (a single .cpp file), finds the "main" function and adds 2 function calls to it, one in the beginning and one in the end.
How can this be done? Can I use g++'s parsing mechanism (or any other parser)?
If you want to make it solid, use clang's libraries.
As suggested by some commenters, let me put forward my idea as an answer:
So basically, the idea is:
... original .cpp file ...
#include <yourHeader>
namespace {
SpecialClass specialClassInstance;
}
Where SpecialClass is something like:
class SpecialClass {
public:
SpecialClass() {
firstFunction();
}
~SpecialClass() {
secondFunction();
}
}
This way, you don't need to parse the C++ file. Since you are declaring a global, its constructor will run before main starts and its destructor will run after main returns.
The downside is that you don't get to know the relative order of when your global is constructed compared to others. So if you need to guarantee that firstFunction is called
before any other constructor elsewhere in the entire program, you're out of luck.
I've heard the GCC parser is both hard to use and even harder to get at without invoking the whole toolchain. I would try the clang C/C++ parser (libparse), and the tutorials linked in this question.
Adding a function at the beginning of main() and at the end of main() is a bad idea. What if someone calls return in the middle?.
A better idea is to instantiate a class at the beginning of main() and let that class destructor do the call function you want called at the end. This would ensure that that function always get called.
If you have control of your main program, you can hack a script to do this, and that's by far the easiet way. Simply make sure the insertion points are obvious (odd comments, required placement of tokens, you choose) and unique (including outlawing general coding practices if you have to, to ensure the uniqueness you need is real). Then a dumb string hacking tool to read the source, find the unique markers, and insert your desired calls will work fine.
If the souce of the main program comes from others sources, and you don't have control, then to do this well you need a full C++ program transformation engine. You don't want to build this yourself, as just the C++ parser is an enormous effort to get right. Others here have mentioned Clang and GCC as answers.
An alternative is our DMS Software Reengineering Toolkit with its C++ front end. DMS, using its C++ front end, can parse code (for a variety of C++ dialects), builds ASTs, carry out full name/type resolution to determine the meaning/definition/use of all symbols. It provides procedural and source-to-source transformations to enable changes to the AST, and can regenerate compilable source code complete with original comments.
Whilst refactoring some old code I realised that a particular header file was full of function declarations for functions long since removed from the .cpp file. Does anyone know of a tool that could find (and strip) these automatically?
You could if possible make a test.cpp file to call them all, the linker will flag the ones that have no code as unresolved, this way your test code only need compile and not worry about actually running.
PC-lint can be tunned for dedicated purpose:
I tested the following code against for your question:
void foo(int );
int main()
{
return 0;
}
lint.bat test_unused.cpp
and got the following result:
============================================================
--- Module: test_unused.cpp (C++)
--- Wrap-up for Module: test_unused.cpp
Info 752: local declarator 'foo(int)' (line 2, file test_unused.cpp) not referenced
test_unused.cpp(2) : Info 830: Location cited in prior message
============================================================
So you can pass the warning number 752 for your puropse:
lint.bat -"e*" +e752 test_unused.cpp
-e"*" will remove all the warnings and +e752 will turn on this specific one
If you index to code with Doxygen you can see from where is each function referenced. However, you would have to browse through each class (1 HTML page per class) and scan for those that don't have anything pointing to them.
Alternatively, you could use ctags to generate list of all functions in the code, and then use objdump or some similar tool to get list of all function in .o files - and then compare those lists. However, this can be problematic due to name mangling.
I don't think there is such thing because some functions not having a body in the actual source tree might be defined in some external library. This can only be done by creating a script which makes a list of declared functions in a header and verifies if they are sometimes called.
I have a C++ ftplugin for vim that is able is check and report unmatched functions -- vimmers, the ftplugin suite is not yet straightforward to install. The ftplugin is based on ctags results (hence its heuristic could be easily adapted to other environments), sometimes there are false positives in the case of inline functions.
HTH,
In addition Doxygen (#Milan Babuskov), you can see if there are warnings for this in your compiler. E.g. gcc has -Wunused-function for static functions; -fdump-ipa-cgraph.
I've heard good things about PC-Lint, but I imagine it's probably overkill for your needs.