I'm using clang to do some analysis and I need to find parent of a declaration in AST. For instance, in the following code I have int x and I want to get its parent which should be the function declaration :
int main(int x) { return 0 }
I know as mentioned in this link http://comments.gmane.org/gmane.comp.compilers.clang.devel/2152 there is a ParentMap class to track parent nodes. However, this just represents a map from Stmt* -> Stmt* and I need to find parent of a declaration. Does anyone know how I could do this?
you can use AstContext::getParents() to find parent of a ast node。
example code like this:
const Stmt* ST = str;
while (true) {
//get parents
const auto& parents = pContext->getParents(*ST);
if ( parents.empty() ) {
llvm::errs() << "Can not find parent\n";
return false;
}
llvm::errs() << "find parent size=" << parents.size() << "\n";
ST = parents[0].get<Stmt>();
if (!ST)
return false;
ST->dump();
if (isa<CompoundStmt>(ST))
break;
}
the AstContext::getParents() can receive a stmt parameter or a decl parameter。
It is exactly ParentMap like described in the linked thread that you are looking for. In clang specific declarations all inherit from clang::Decl which provides
virtual Stmt* getBody() const;
Alternatively you might also be happy with the ready-made AST matchers which make creating queries on the AST much easier. The clang-tidy checks make heavy use of them and are pretty easy to follow, see the sources [git].
About parent of a FunctionDecl, there something to notice: a declaration of Function may be a member of a class or it may be an "independent" declaration.
If FunctionDecl is a member of a class then FunctionDecl is a CXXMethodDecl so you can check with:
isa<CXXMethodDecl>(FunctionDecl *FD).
Then you can get the parent of CXXMethodDecl with getParent() method. This method is absent in FunctionDecl.
Related
I'm writing some kind of tool that extracts the interface definitions of C++ code.
In process of writing, I decided to restrict the parser to process only the code that was explicitly marked for processing, and I thought that C++ attributes are the best way to do it.
I'd prefer to add e.g. [[export]] annotations to entities I want to export, but I realised that libTooling is unable to see custom attributes without registering them in Clang code itself (I mean adding the attribute to tools/clang/include/clang/Basic/Attr.td).
Thus, my question is: is there a way to register the attribute without modifying that file (e.g. by registering the attribute programmatically or writing own Attr.td file)?
UPD: I'm using ASTMatchers library for source code analysis, so visitor-based approach probably does not work for me.
From what I can tell it is not possible to register custom attributes without directly modifying libtooling.
If you're willing to use pre-processor macros instead of attributes there is a workaround that I've done in the past. The basics are that we'll declare an empty macro, write a pre-processor callback to identify the location of the macro and store it in a queue, then in an AST visitor we'll visit records for either classes, methods, or variables, and check to see if preceeding the entity is our macro.
For the preprocessor you'll need to extend clang::PPCallbacks and implement the MacroExpands method.
void MyPreProcessorCallback::MacroExpands(const clang::Token& MacroNameTok, const clang::MacroDefinition&, const clang::SourceRange Range, const clang::MacroArgs* const Args)
{
// Macro is not named for some reason.
if(!MacroNameTok.isAnyIdentifier())
{ return; }
if(MacroNameTok.getIdentifierInfo()->getName() == "EXPORT")
{
// Save Range into a queue.
}
else
{
return;
}
// If you want arguments you can declare the macro to have varargs and loop
// through the tokens, you can have any syntax you want as they're raw tokens.
// /* Get the argument list for this macro, because it's a
// varargs function all arguments are stored in argument 0. */
// const ::clang::Token* token = Args->getUnexpArgument(0u);
// // All tokens for the argument are stored in sequence.
// for(; token->isNot(::clang::tok::eof); ++token)
// {
// }
}
Inside your RecursiveAstVisitor you can implement visitors that will pop off the top of the queue and check to see if the top macro is before in the translation unit. IIRC visitors of a type are all executed in order of declaration, so the queue should maintain the order. It is worth noting that all Decl's of a type are visited in order, so care has to be taken when distinguishing between function, variables, and classes.
bool MyAstVisitor::VisitFunctionDecl(::clang::FunctionDecl* const function)
{
if(::llvm::isa<::clang::CXXMethodDecl>(function))
{
// If you want to handle class member methods separately you
// can check here or implement `VisitCXXMethodDecl` and fast exit here.
}
if(ourExportTags.empty())
{
return true;
}
const ::clang::SourceLocation tagLoc = ourExportTags.front().getBegin();
const ::clang::SourceLocation declLoc = function->getBeginLoc();
if(getAstContext().getSourceManager().isBeforeInTranslationUnit(tagLoc, declLoc))
{
ourExportTags.pop_front();
// Handle export;
}
return true;
}
EDIT
I haven't used ASTMatchers before, but you could probably accomplish a similar result by writing a matcher, storing all of the declarations to a list, sorting based on location, and then comparing to the original export tag queue.
DeclarationMatcher matcher = functionDecl().bind("funcDecl");
class MyFuncMatcher : public clang::ast_matchers::MatchFinder::MatchCallback
{
public:
virtual void run(const clang::ast_matchers::MatchFinder::MatchResult& Result)
{
if(const FunctionDecl* func = Result.Nodes.getNodeAs<clang::FunctionDecl>("funcDecl"))
{
// Add to list for future processing
}
}
};
void joinTagsToDeclarations()
{
// Sort the declaration list
for(auto decl : myDeclList)
{
if(ourExportTags.empty())
{
break;
}
const ::clang::SourceLocation tagLoc = ourExportTags.front().getBegin();
const ::clang::SourceLocation declLoc = decl->getBeginLoc();
if(getAstContext().getSourceManager().isBeforeInTranslationUnit(tagLoc, declLoc))
{
ourExportTags.pop_front();
// Handle export;
}
}
}
I have a Function pass, called firstPass, which does some analysis and populates:
A a;
where
typedef std::map< std::string, B* > A;
class firstPass : public FunctionPass {
A a;
}
typedef std::vector< C* > D;
class B {
D d;
}
class C {
// some class packing information about basic blocks;
}
Hence I have a map of vectors traversed by std::string.
I wrote associated destructors for these classes. This pass works successfully on its own.
I have another Function pass, called secondPass, needing this structure of type A to make some transformations. I used
bool secondPass::doInitialization(Module &M) {
errs() << "now running secondPass\n";
a = getAnalysis<firstPass>().getA();
return false;
}
void secondPass::getAnalysisUsage(AnalysisUsage &AU) const {
AU.addRequired<firstPass>();
AU.setPreservesAll();
}
The whole code compiles fine, but I get a segmentation fault when printing this structure at the end of my first pass only if I call my second pass (since B* is null).
To be clear:
opt -load ./libCustomLLVMPasses.so -passA < someCode.bc
prints in doFinalization() and exits successfully
opt -load ./libCustomLLVMPasses.so -passA -passB < someCode.bc
gives a segmentation fault.
How should I wrap this data structure and pass it to the second pass without issues? I tried std::unique_ptr instead of raw ones but I couldn't make it work. I'm not sure if this is the correct approach anyway, so any help will be appreciated.
EDIT:
I solved the problem of seg. fault. It was basically me calling getAnalysis in doInitialization(). I wrote a ModulePass to combine my firstPass and secondPass whose runOnModule is shown below.
bool MPass::runOnModule(Module &M) {
for(Function& F : M) {
errs() << "F: " << F.getName() << "\n";
if(!F.getName().equals("main") && !F.isDeclaration())
getAnalysis<firstPass>(F);
}
StringRef main = StringRef("main");
A& a = getAnalysis<firstPass>(*(M.getFunction(main))).getA();
return false;
}
This also gave me to control the order of the functions processed.
Now I can get the output of a pass but cannot use it as an input to another pass. I think this shows that the passes in llvm are self-contained.
I'm not going to comment on the quality of the data structures based on their C++ merit (it's hard to comment on that just by this minimal example).
Moreover, I wouldn't use the doInitialization method, if the actual initialization is that simple, but this is a side comment too. (The doc does not mention anything explicitly about it, but if it is ran once per Module while the runOn method is ran on every Function of that module, it might be an issue).
I suspect that the main issue seems to stem from the fact A a in your firstPass is bound to the lifetime of the pass object, which is over once the pass is done. The simplest change would be to allocate that object on the heap (e.g. new) and return a pointer to it when calling getAnalysis<firstPass>().getA();.
Please note that using this approach might require manual cleanup if you decide to use a raw pointer.
I am using Antlr4's C++ visitor api to traverse a parse tree. However, I'm struggling to get it functioning correctly. Namely, I'm not sure how to use the visitChildren(ParseTree *tree) call.
I'm given the context for each rule that I have defined. And I can traverse the tree using the contexts: context->accept[RuleContext]([RuleContext]* rule)
However, when I use those I continually visit the same node multiple times.
For instance:
program:
: nameRule
dateRule
( statements )*
EOF
;
nameRule
: NAME IDENTIFIER ;
dateRule
: DATE IDENTIFIER ;
statements:
: statementX
| statementY
| statementZ
;
statementX:
: // do something here
statementY:
: // do something here
statementZ:
: // do something here
IDENTIFIER, DATE, and NAME are terminals.
I build the Antlr parsing structure by the following:
void Parser::parse() {
ifstream file(FLAGS_c, ifstream::binary);
// Convert the file into ANTLR's format.
ANTLRInputStream stream = ANTLRInputStream(file);
// Give the input to the lexer.
MyLexer lexer = new MyLexer(&stream);
// Generate the tokens.
CommonTokenStream tokens(lexer);
file.close();
tokens.fill();
// Create the translation that will parse the input.
MyParser parser = new MyParser(&tokens);
parser->setBuildParseTree(true);
MyParser::ProgramContext *tree = parser->program();
auto *visitor = new MyVisitor();
visitor->visitProgram(tree);
}
So when I try to traverse this it looks similar to this, the class MyVisitor extends MyParserVisitor. MyVisitor is the visitor class I use to traverse the generated tree.
Any MyVisitor::visitProgram(ParserVisitor::ProgramContext *context) {
this->visitNameRule(context->nameRule());
this->visitDateRule(context->dateRule());
if (!this->statements.empty()) {
for (auto &it : this->statements) {
this->visitStatements(it);
}
}
return Any(context);
}
// Omitting name and date rules.
Any MyVisitor::visitStatements(ParserVisitor::StatementContext *context) {
this->visitStatementX(context->statementX());
this->visitStatementY(context->statementY());
this->visitStatementZ(context->statementZ());
return Any(context);
}
In this case, statements X, Y, and Z will be visited every time statements is visited. Even if they aren't present in the input program.
Is this the correct way to use this? If it isn't, then I assume the visitChildren(ParseTree *tree) is the correct api to use at each visitor function. But I don't understand how to get access to the ParseTree data structure from the *Context.
This question is not directly related to the C++ visitor, but a general visitor problem in ANTLR4. What you are doing is to shortcut the visitor walk in a way you are not intended to do. Don't explicitly visit the certain sub trees manually, but instead call the super implementation to let it do for you and collect the result in individual visitStatementXXX functions. Look at this implementation of a (very simple) expression evaluator, used in a unit test (written in C++). Here's a partial copy to demonstrate the principle:
class EvalParseVisitor : public MySQLParserBaseVisitor {
public:
std::vector<EvalValue> results; // One entry for each select item.
bool asBool(EvalValue in) {
if (!in.isNullType() && in.number != 0)
return true;
return false;
};
virtual Any visitSelectItem(MySQLParser::SelectItemContext *context) override {
Any result = visitChildren(context);
results.push_back(result.as<EvalValue>());
return result;
}
virtual Any visitExprNot(MySQLParser::ExprNotContext *context) override {
EvalValue value = visit(context->expr());
switch (value.type) {
case EvalValue::Null:
return EvalValue::fromNotNull();
case EvalValue::NotNull:
return EvalValue::fromNull();
default:
return EvalValue::fromBool(!asBool(value));
}
}
virtual Any visitExprAnd(MySQLParser::ExprAndContext *context) override {
EvalValue left = visit(context->expr(0));
EvalValue right = visit(context->expr(1));
if (left.isNullType() || right.isNullType())
return EvalValue::fromNull();
return EvalValue::fromBool(asBool(left) && asBool(right));
return visitChildren(context);
}
...
The essential part is the call to visit() which in turn iterates over the child nodes of the given context tree and triggers only visitor functions for elements that actually exist.
To learn LLVM I made a ModulePass that runs through the functions, basic blocks, and finally instructions. At some point I want to dig into the instructions and perform analysis. While reading the documentation I came across http://llvm.org/docs/doxygen/html/classllvm_1_1InstVisitor.html and the documentation recommends using these structures to efficiently traverse IR rather than do a lot of if(auto* I = dyn_cast<>()) lines.
I tried making a variation of the documentation example, but for BranchInst:
struct BranchInstVisitor : public InstVisitor<BranchInst> {
unsigned Count;
BranchInstVisitor() : Count(0) {}
void visitBranchInst(BranchInst &BI){
Count++;
errs() << "BI found! " << Count << "\n";
}
}; // End of BranchInstVisitor
Within my ModulePass, I created the visitor:
for(Module::iterator F = M.begin(), modEnd = M.end(); F != modEnd; ++F){
BranchInstVisitor BIV;
BIV.visit(F);
...
Unfortunately, my call to visit(F) fails when I compile:
error: invalid static_cast from type ‘llvm::InstVisitor<llvm::BranchInst>* const’ to type ‘llvm::BranchInst*’ static_cast<SubClass*>(this)->visitFunction(F);
How do I correctly implement an LLVM InstVisitor? Are InstVisitors supposed to be run outside of passes? If I missed documentation, please let me know where to go.
The template parameter should be the type you're declaring, not a type of instruction, like this:
struct BranchInstVisitor : public InstVisitor<BranchInstVisitor>
Each visitor can override as many visit* methods as you want -- it's not like each visitor is tied to one type of instruction. That wouldn't be very useful.
I'm a newbie at using the STL Algorithms and am currently stuck on a syntax error. My overall goal of this is to filter the source list like you would using Linq in c#. There may be other ways to do this in C++, but I need to understand how to use algorithms.
My user-defined function object to use as my function adapter is
struct is_Selected_Source : public std::binary_function<SOURCE_DATA *, SOURCE_TYPE, bool>
{
bool operator()(SOURCE_DATA * test, SOURCE_TYPE ref)const
{
if (ref == SOURCE_All)
return true;
return test->Value == ref;
}
};
And in my main program, I'm using as follows -
typedef std::list<SOURCE_DATA *> LIST;
LIST; *localList = new LIST;;
LIST* msg = GLOBAL_DATA->MessageList;
SOURCE_TYPE _filter_Msgs_Source = SOURCE_TYPE::SOURCE_All;
std::remove_copy(msg->begin(), msg->end(), localList->begin(),
std::bind1st(is_Selected_Source<SOURCE_DATA*, SOURCE_TYPE>(), _filter_Msgs_Source));
What I'm getting the following error in Rad Studio 2010. The error means "Your source file used a typedef symbol where a variable should appear in an expression. "
"E2108 Improper use of typedef 'is_Selected_Source'"
Edit -
After doing more experimentation in VS2010, which has better compiler diagnostics, I found the problem is that the definition of remove_copy only allows uniary functions. I change the function to uniary and got it to work.
(This is only relevant if you didn't accidentally omit some of your code from the question, and may not address the exact problem you're having)
You're using is_Selected_Source as a template even though you didn't define it as one. The last line in the 2nd code snippet should read std::bind1st(is_Selected_Source()...
Or perhaps you did want to use it as a template, in which case you need to add a template declaration to the struct.
template<typename SOURCE_DATA, typename SOURCE_TYPE>
struct is_Selected_Source : public std::binary_function<SOURCE_DATA *, SOURCE_TYPE, bool>
{
// ...
};
At a guess (though it's only a guess) the problem is that std::remove_copy expects a value, but you're supplying a predicate. To use a predicate, you want to use std::remove_copy_if (and then you'll want to heed #Cogwheel's answer).
I'd also note that:
LIST; *localList = new LIST;;
Looks wrong -- I'd guess you intended:
LIST *locallist = new LIST;
instead.