I am using Antlr4's C++ visitor api to traverse a parse tree. However, I'm struggling to get it functioning correctly. Namely, I'm not sure how to use the visitChildren(ParseTree *tree) call.
I'm given the context for each rule that I have defined. And I can traverse the tree using the contexts: context->accept[RuleContext]([RuleContext]* rule)
However, when I use those I continually visit the same node multiple times.
For instance:
program:
: nameRule
dateRule
( statements )*
EOF
;
nameRule
: NAME IDENTIFIER ;
dateRule
: DATE IDENTIFIER ;
statements:
: statementX
| statementY
| statementZ
;
statementX:
: // do something here
statementY:
: // do something here
statementZ:
: // do something here
IDENTIFIER, DATE, and NAME are terminals.
I build the Antlr parsing structure by the following:
void Parser::parse() {
ifstream file(FLAGS_c, ifstream::binary);
// Convert the file into ANTLR's format.
ANTLRInputStream stream = ANTLRInputStream(file);
// Give the input to the lexer.
MyLexer lexer = new MyLexer(&stream);
// Generate the tokens.
CommonTokenStream tokens(lexer);
file.close();
tokens.fill();
// Create the translation that will parse the input.
MyParser parser = new MyParser(&tokens);
parser->setBuildParseTree(true);
MyParser::ProgramContext *tree = parser->program();
auto *visitor = new MyVisitor();
visitor->visitProgram(tree);
}
So when I try to traverse this it looks similar to this, the class MyVisitor extends MyParserVisitor. MyVisitor is the visitor class I use to traverse the generated tree.
Any MyVisitor::visitProgram(ParserVisitor::ProgramContext *context) {
this->visitNameRule(context->nameRule());
this->visitDateRule(context->dateRule());
if (!this->statements.empty()) {
for (auto &it : this->statements) {
this->visitStatements(it);
}
}
return Any(context);
}
// Omitting name and date rules.
Any MyVisitor::visitStatements(ParserVisitor::StatementContext *context) {
this->visitStatementX(context->statementX());
this->visitStatementY(context->statementY());
this->visitStatementZ(context->statementZ());
return Any(context);
}
In this case, statements X, Y, and Z will be visited every time statements is visited. Even if they aren't present in the input program.
Is this the correct way to use this? If it isn't, then I assume the visitChildren(ParseTree *tree) is the correct api to use at each visitor function. But I don't understand how to get access to the ParseTree data structure from the *Context.
This question is not directly related to the C++ visitor, but a general visitor problem in ANTLR4. What you are doing is to shortcut the visitor walk in a way you are not intended to do. Don't explicitly visit the certain sub trees manually, but instead call the super implementation to let it do for you and collect the result in individual visitStatementXXX functions. Look at this implementation of a (very simple) expression evaluator, used in a unit test (written in C++). Here's a partial copy to demonstrate the principle:
class EvalParseVisitor : public MySQLParserBaseVisitor {
public:
std::vector<EvalValue> results; // One entry for each select item.
bool asBool(EvalValue in) {
if (!in.isNullType() && in.number != 0)
return true;
return false;
};
virtual Any visitSelectItem(MySQLParser::SelectItemContext *context) override {
Any result = visitChildren(context);
results.push_back(result.as<EvalValue>());
return result;
}
virtual Any visitExprNot(MySQLParser::ExprNotContext *context) override {
EvalValue value = visit(context->expr());
switch (value.type) {
case EvalValue::Null:
return EvalValue::fromNotNull();
case EvalValue::NotNull:
return EvalValue::fromNull();
default:
return EvalValue::fromBool(!asBool(value));
}
}
virtual Any visitExprAnd(MySQLParser::ExprAndContext *context) override {
EvalValue left = visit(context->expr(0));
EvalValue right = visit(context->expr(1));
if (left.isNullType() || right.isNullType())
return EvalValue::fromNull();
return EvalValue::fromBool(asBool(left) && asBool(right));
return visitChildren(context);
}
...
The essential part is the call to visit() which in turn iterates over the child nodes of the given context tree and triggers only visitor functions for elements that actually exist.
Related
I'm writing some kind of tool that extracts the interface definitions of C++ code.
In process of writing, I decided to restrict the parser to process only the code that was explicitly marked for processing, and I thought that C++ attributes are the best way to do it.
I'd prefer to add e.g. [[export]] annotations to entities I want to export, but I realised that libTooling is unable to see custom attributes without registering them in Clang code itself (I mean adding the attribute to tools/clang/include/clang/Basic/Attr.td).
Thus, my question is: is there a way to register the attribute without modifying that file (e.g. by registering the attribute programmatically or writing own Attr.td file)?
UPD: I'm using ASTMatchers library for source code analysis, so visitor-based approach probably does not work for me.
From what I can tell it is not possible to register custom attributes without directly modifying libtooling.
If you're willing to use pre-processor macros instead of attributes there is a workaround that I've done in the past. The basics are that we'll declare an empty macro, write a pre-processor callback to identify the location of the macro and store it in a queue, then in an AST visitor we'll visit records for either classes, methods, or variables, and check to see if preceeding the entity is our macro.
For the preprocessor you'll need to extend clang::PPCallbacks and implement the MacroExpands method.
void MyPreProcessorCallback::MacroExpands(const clang::Token& MacroNameTok, const clang::MacroDefinition&, const clang::SourceRange Range, const clang::MacroArgs* const Args)
{
// Macro is not named for some reason.
if(!MacroNameTok.isAnyIdentifier())
{ return; }
if(MacroNameTok.getIdentifierInfo()->getName() == "EXPORT")
{
// Save Range into a queue.
}
else
{
return;
}
// If you want arguments you can declare the macro to have varargs and loop
// through the tokens, you can have any syntax you want as they're raw tokens.
// /* Get the argument list for this macro, because it's a
// varargs function all arguments are stored in argument 0. */
// const ::clang::Token* token = Args->getUnexpArgument(0u);
// // All tokens for the argument are stored in sequence.
// for(; token->isNot(::clang::tok::eof); ++token)
// {
// }
}
Inside your RecursiveAstVisitor you can implement visitors that will pop off the top of the queue and check to see if the top macro is before in the translation unit. IIRC visitors of a type are all executed in order of declaration, so the queue should maintain the order. It is worth noting that all Decl's of a type are visited in order, so care has to be taken when distinguishing between function, variables, and classes.
bool MyAstVisitor::VisitFunctionDecl(::clang::FunctionDecl* const function)
{
if(::llvm::isa<::clang::CXXMethodDecl>(function))
{
// If you want to handle class member methods separately you
// can check here or implement `VisitCXXMethodDecl` and fast exit here.
}
if(ourExportTags.empty())
{
return true;
}
const ::clang::SourceLocation tagLoc = ourExportTags.front().getBegin();
const ::clang::SourceLocation declLoc = function->getBeginLoc();
if(getAstContext().getSourceManager().isBeforeInTranslationUnit(tagLoc, declLoc))
{
ourExportTags.pop_front();
// Handle export;
}
return true;
}
EDIT
I haven't used ASTMatchers before, but you could probably accomplish a similar result by writing a matcher, storing all of the declarations to a list, sorting based on location, and then comparing to the original export tag queue.
DeclarationMatcher matcher = functionDecl().bind("funcDecl");
class MyFuncMatcher : public clang::ast_matchers::MatchFinder::MatchCallback
{
public:
virtual void run(const clang::ast_matchers::MatchFinder::MatchResult& Result)
{
if(const FunctionDecl* func = Result.Nodes.getNodeAs<clang::FunctionDecl>("funcDecl"))
{
// Add to list for future processing
}
}
};
void joinTagsToDeclarations()
{
// Sort the declaration list
for(auto decl : myDeclList)
{
if(ourExportTags.empty())
{
break;
}
const ::clang::SourceLocation tagLoc = ourExportTags.front().getBegin();
const ::clang::SourceLocation declLoc = decl->getBeginLoc();
if(getAstContext().getSourceManager().isBeforeInTranslationUnit(tagLoc, declLoc))
{
ourExportTags.pop_front();
// Handle export;
}
}
}
I'm currently building a library that parses XML definitions for hardware configurations (obtained from the manufacturer).
I've mapped the XML types to c++ classes, and I'm making use of std::optional where ever there is an optional XML member when it is semantically correct for that piece of data to be missing.
I'm now trying to come up with a good error-handling strategy for my datatypes.
Sometimes, the XML may be missing some information that is marked as required by the schema, or a required element might not be found (which would be a different error to an element which is missing required data).
The basic idea for all the types follows this (example) class:
class TMyXmlType {
std::string name;
std::optional<int> factor;
std::optional<int> minFactor;
std::optional<int> maxFator;
public:
TMyXmlType(const xml_node & root){
if(root){ // Check if the element exists
name = root.value();
if(root.has_child("factor"){ factor = root.child("factor").value(); }
if(root.has_child("minFactor"){ factor = root.child("minFactor").value(); }
if(root.has_child("maxFator"){ factor = root.child("maxFator").value(); }
}else{
// What do I do here?
}
}
operator Json::Value() const {
if(/*object constructed correctly?*/){
Json::Value asJson;
asJson["name"] = name;
if(factor.has_value()){ asJson["factor"] = factor.value(); }
if(minFactor.has_value()){ asJson["minFactor"] = minFactor.value(); }
if(maxFator.has_value()){ asJson["maxFator"] = factor.maxFator(); }
return asJson;
}else{
// return error object?
}
}
}
So far, so good. The optional members are being taken care of.
However, root might be an empty node (the xml parsing library returns an empty node if it wasn't found.).
I basically want to return an error object instead of the value of the class (in my operator function) if one or more required XML nodes weren't found.
As far as I was able to find, for modern C++ you're supposed to throw an exception if the constructor can't construct the object correctly, however, if I throw an exception, my aggregate datatypes will have massive constructors with a bunch of try/catch blocks for each required data-member, which would make the codebase a pain to read and to maintain.
So now, the question is: What would be the cleanest way to have the operator return an error object instead of the class data if a required member is missing?
I don't need the constructor to explicitly fail, it currently also won't ever throw (as far as I know) and I really want error objects to give to the caller instead of return codes or bubbling exceptions.
Plain and simple, when you don't wan't an error to fire don't do anything fancy - eg. just do:
class TMyXmlType {
std::string name;
std::optional<int> factor;
std::optional<int> minFactor;
std::optional<int> maxFator;
public:
TMyXmlType(const xml_node & root){
if(root){ // Check if the element exists
name = root.value();
if(root.has_child("factor"){ factor = root.child("factor").value(); }
if(root.has_child("minFactor"){ factor = root.child("minFactor").value(); }
if(root.has_child("maxFator"){ factor = root.child("maxFator").value(); }
}else{
name = "NODE_ERROR";
}
}
}
That being said - reconsider your design - passing an invalid node to the constructor is probably an error you don't wan't to silence and should probably fire an exception !
(But catch the exception at a higher level - not for every construction - perhaps that is your underlying misconception)
Checking if a node is found for a search should be done at the level calling the search. Eg. the following code seems sensible to me:
const auto& node_found = searchNode(...);
const auto optXmlType = node_found ? std::make_optional<TMyXmlType>(node_found) : std::nullopt;
However, you might want to wrap that in a function.
In my code I have an if-else block condition like this:
public String method (Info info) {
if (info.isSomeBooleanCondition) {
return "someString";
}
else if (info.isSomeOtherCondition) {
return "someOtherString";
}
else if (info.anotherCondition) {
return "anotherStringAgain";
}
else if (lastCondition) {
return "string ...";
}
else return "lastButNotLeastString";
}
Each conditional branch returns a String.
Since if-else statements are difficult to read, test and maintain, how can I replace?
I was thinking to use Chain Of Responsability Pattern, is it right in this case?
Is there any other elegant way that I can do that?
I am left to assume that your code does not exist in the Info class as it is passed in an referenced for all but that last condition. My first instinct would be to make String OtherClass.method(Info) into String Info.method() and have it return the appropriate string.
Next, I would take a look at the conditions. Are they really conditions or can they be mapped to a table. Whenever I see code performing a lookup, such as this, I tend to fall back on attempting to fit into a dictionary or map so I can perform a lookup for the value.
If you are left with conditions that must be checked then I would begin thinking about lambdas, delegates or custom interface. A series of if..then across the same type could easily be represented. Next, you would collect them and execute accordingly. IMO, this would make the if..then bunch much clearer. It is more code by is secondary at this point.
interface IInfoCheck
{
bool TryCheck(Info info, out string);
}
public OtherClass()
{
// Setup checks
CheckerCollection.add(new IInfoCheck{
public String check(out result) {
// check code
}
});
}
public String method(Info info) {
foreach (IInfoCheck ic in CheckerCollection)
{
String result = null;
if (ic.TryCheck(out result))
{
return result;
}
}
}
The problem statement does not fit into an ideal chain of responsibility scenario because it is either/or kind or conditions which look 'chained' but is actually 'not'. Reason - one processes all the chain-links in the chain of responsibility pattern irrespective of what happened in the previous links, i.e. no chain-links are skipped(although you can configure which chain links to process and which not - but still the execution of a chain-link is not dependent on the outcome of a previous chain-link). However, in this if-else-if* scenario - once an if statement condition matches, the further conditions are not evaluated.
I have thought of an alternative design which achieves the above without if-else, but it is lengthier but at the same time more flexible.
Lets say we have a FunctionalInterface IfElseReplacer which takes 'info' as input and gives 'String' output.
public Interface IfElseReplacer(){
public String executeCondition(Info);
}
Then the above conditions can be re-phrased as lambda expressions would look like -
"(Info info) -> info.someCondition ? someString"
"(Info info) -> info.anotherCondition ? someOtherString"
and so on...
Then we need a processConditons method to process these Lambdas- it could be a default method in ifElseReplacer -
default String processConditions(List<IfElseReplacer> ifElseReplacerList, Info info){
String strToReturn="lastButNotLeastString";
for(IfElseReplacer ifElseRep:ifElseReplacerList){
strToReturn=ifElseRep.executeCondition(info);
if(!"lastButNotLeastString".equals(strToReturn)){
break;//if strToReturn's value changes i.e. executeCondition returns a String valueother than "lastButNotLeastString" then exit the for loop
}
return strToReturn;
}
What remains now is to (I am skipping the code for this - please let me know if you need it then will write this also) -
From wherever the if-else conditions need to be checked there -
Create an array of lambda expressions as explained above assigning them to IfElseReplacer interfaces while adding them to a list of type IfElseReplacer.
Pass this list to the default method processConditions() along with an instance of Info.
Default method would return the String value which we would be same as the result of if-else-if* block given in the problem statement.
I'd simply factor out the returns:
return
info.isSomeBooleanCondition ? "someString" :
info.isSomeOtherCondition ? "someOtherString" :
info.anotherCondition ? "anotherStringAgain" :
lastCondition ? "string ..." :
"lastButNotLeastString"
;
From the limited information about the problem, and the code given, it looks like this a case of type-switching. The default solution would be to use a inheritance for that:
class Info {
public abstract String method();
};
class BooleanCondition extends Info {
public String method() {
return "something";
};
class SomeOther extends Info {
public String getString() {
return "somethingElse";
};
Patterns which are interesting in this case are Decorator, Strategy and Template Method. Chain of Responsibility has another focus. Each element in the chain implement logic to process some commands. When chained, an object forwards the command if it cannot process it. This implements a loosly coupled structure to process commands where no central dispatch is needed.
If computing the string on the conditions is an operation, and from the name of the class I am guessing that it is probably an expression tree, you should look at the Visitor pattern.
I and want to parallelise a seemingly straightforward problem using tbb::tasks. My tasks can be split into sub-tasks, the number of which cannot be chosen, but is determined by the state of the task (not known in advance). As a parent task does not require the results of its sub-tasks, I'd like to recycle the parent as its child. I couldn't find a good working example of this in the online documentation or examples, hence my question here. My current idea is to code along these lines:
struct my_task : tbb::task {
typedef implementation_defined task_data;
task_data DATA;
my_task(task_data const&data) : DATA(data) {}
void reset_state(task_data const&data) { DATA=data; }
bool is_small() const;
void serial_execution();
bool has_more_sub_tasks() const;
task_data parameters_for_next_sub_task();
tbb::task*execute()
{
if(is_small()) {
serial_execution();
return nullptr;
}
tbb::empty_task&Continuation = allocate_continuation(); // <-- correct?
task_data first_sub_task = parameters_for_next_sub_task();
int sub_task_counter = 1;
tbb::task_list further_sub_tasks;
for(; has_more_sub_tasks(); ++sub_task_counter)
further_sub_tasks.push_back(*new(Continuation.allocate_child())
my_task(parameters_for_next_sub_task());
Continuation.set_ref_count(sub_task_counter); // <-- correct?
spawn(further_sub_tasks);
recycle_as_child_of(Continuation); // <-- correct?
reset_state(first_sub_task); // change state
return this; // <-- correct?
}
};
my_task*root_task = new(tbb::task::allocate_root())
my_task(parameters_for_root_task());
tbb::task::spawn_root_and_wait(*root_task);
Is this a correct and/or the best way to do this? (note that in my code above the empty continuation task is neither spawned nor returned)
The line that creates the continuation should be:
tbb::empty_task&Continuation = *new( allocate_continuation() ) tbb::empty_task;
The logic between set_ref_count and reset_state looks correct.
I am trying to parse header packet of SIP protocol (Similar to HTTP) which is a text based protocol.
The fields in the header do not have an order.
For ex: if there are 3 fields, f1, f2, and f3 they can come in any order any number of times say f3, f2 , f1, f1.
This is increasing the complexity of my parser since I don't know which will come first.
What should I do to overcome this complexity?
Ultimately, you simply need to decouple your processing from the order of receipt. To do that, have a loop that repeats while fields are encountered, and inside the loop determine which field type it is, then dispatch to the processing for that field type. If you can process the fields immediately great, but if you need to save the potentially multiple values given for a field type you might - for example - put them into a vector or even a shared multimap keyed on the field name or id.
Pseudo-code:
Field x;
while (x = get_next_field(input))
{
switch (x.type())
{
case Type1: field1_values.push_back(x.value()); break;
case Type2: field2 = x.value(); break; // just keep the last value seen...
default: throw std::runtime_error("unsupported field type");
}
}
// use the field1_values / field2 etc. variables....
Tony already gave the main idea, I'll get more specific.
The basic idea in parsing is that it is generally separated into several phases. In your case you need to separate the lexing part (extracting the tokens) from the semantic part (acting on them).
You can proceed in different fashions, since I prefer a structured approach, let us suppose that we have a simple struct reprensenting the header:
struct SipHeader {
int field1;
std::string field2;
std::vector<int> field3;
};
Now, we create a function that take a field name, its value, and fill the corresponding field of the SipHeader structure appropriately.
void parseField(std::string const& name, std::string const& value, SipHeader& sh) {
if (name == "Field1") {
sh.field1 = std::stoi(value);
return;
}
if (name == "Field2") {
sh.field2 = value;
return;
}
if (name == "Field3") {
// ...
return;
}
throw std::runtime_error("Unknown field");
}
And then you iterate over the lines of the header and for each line separate the name and the value and call this functions.
There are obviously refinements:
instead of a if-chain you can use a map of functors or you can fully tokenize the source and store the fields in a std::map<std::string, std::string>
you can use a state-machine technic to immediately act on it without copying
but the essential advice is the same:
To manage complexity you need to separate the task in orthogonal subtasks.