Can I use virtual tokens (tokens with identical return value) in ANTLR4 similar to c++? - c++

In C++ I can use virtual functions to process data from similar classes that have the same parent/ancestor, does ANTLR4 support this and how would I have to set up the grammar?
I have tried to set up a grammar, using arguments that have the same return value and use that value in a token that contains the different "subclassed" tokens.
Here is some code I have tried to work with:
amf_group
: statements=amf_statements (GROUPSEP WS? LINE_COMMENT? EOL? | EOF)
;
amf_statements returns [amf::AmfStatements stmts]
: ( WS? ( stmt=amf_statement { stmts.emplace_back(std::move($stmt.stmtptr)); } WS? EOL) )*
;
amf_statement returns [amf::AmfStatementPtr stmtptr]
: (
stmt = jsonparent_statement
| stmt = jsonvalue_statement
)
{
$stmtptr = std::move($stmt.stmtptr);
}
;
jsonparent_statement returns [amf::AmfStatementPtr stmtptr] locals [int lineno=0]
:
(T_JSONPAR { $lineno = $T_JSONPAR.line;} ) WS (arg=integer_const)
{
$stmtptr = std::make_shared<amf::JSONParentStatement>($lineno, nullptr);
}
;
jsonvalue_statement returns [amf::AmfStatementPtr stmtptr] locals [int lineno=0]
: ( T_JSONVALUE { $lineno = $T_JSONVALUE.line; } ) WS (arg=integer_const) (WS fmt=integer_const)?
{
$stmtptr = std::make_shared<amf::JSONValueStatement>($lineno, std::move($arg.argptr), std::move($fmt.argptr));
}
;
I receive the following error:
error(75): amf1.g4:23:10: label stmt=jsonvalue_statement type mismatch with previous definition: stmt=jsonparent_statement
This error is or course quite logical, because the tokens are indeed of a different type, but there return value types are identical. For two (virtual) tokens I can write all the code separatelty, but in my case I have some 40+ different tokens that either represent arguments or statements and writing all the combinations would be cumbersome. The above code did work in Antlr3 by the way.
Is there another way to get around these errors using ANTLR4? Does anybody have any suggestions?

What's specified in a rule return value is not really a return value in a functional sense. Instead the context representing the rule will get a new member field that takes the "return" value. Given that it makes no sense trying to treat parser rules like C++ functions, they are simply not comparable.
Instead of handling all the fields in your grammar, I recommend a different approach: with ANTLR4 you will get a parse tree (if enabled), which represents the matched rules using parse rule contexts (which is super view of the previously generated AST). This context contains all the values that have been parsed out. You just need a listener in a second step after the parse run (often called the semantic phase) to walk over this tree, pick those values up and create your own data structures from them. This separation also allows to use your parser for quick syntax checks, since you don't do all the heavy work in the parse run.

Related

How to do string formatting in BetterC mode?

I'd like to use something like the "Concepts" package from Atila Neves.
I implemented the check of an object against a type signature myself in a simple naive way. I can check struct objects against interfaces which I define within compile-time-evaluated delegate blocks to make them work with BetterC. I only used compile-time function evaluation with enums which receive return values of executed delegate code blocks.
Now I faced problems with std.format.format which uses TypeInfo for %s formatters which gives errors when compiling in BetterC. For code generation I'd like to use token strings because they have syntax highlighting. But proper usage of them requires string interpolation or string formatting. core.stdc.stdio.snprintf is no alternative because CTFE can only interprete D source code.
This is not technically a problem. I can just turn token strings into WYSIWYG strings.
But I wonder why I can't use it. The official documentation says, compile-time features are unrestricted for BetterC (I assume this includes TypeInfo). Either it is plain wrong or I am doing it wrong.
template implementsType(alias symbol, type)
if (isAbstractClass!type)
{
enum implementsType = mixin(implementsTypeExpr);
enum implementsTypeExpr =
{
import std.format : format;
auto result = "";
static foreach(memberName; __traits(allMembers, type))
{
result ~= format(
q{__traits(compiles, __traits(getMember, symbol, "%1$s")) && }~
q{covariantSignature!(__traits(getMember, symbol, "%1$s"), __traits(getMember, type, "%1$s")) && }
, memberName);
}
return (result.length >= 3)? result[0 .. $-3] : result;
}();
}
TypeInfo are not available with BetterC.
There's a bc-string dub package that provides a limited string formatter that will work in BetterC.

Antlr 4 C++ target grammar rule returning lambda, missing attribute access on rule reference error

I'm a new to both C++ and Antlr, so pardon my ignorance.
I need millions of values to be derived based on several rules.
Eg rule1:- Value = ob.field1 * ob.field2 //the user defines the rule
Eg rule2:- Value = 4* ob.field4 < 3* ob.field1 ? 5 : 0b.field6
so, I need to parse the rules only once and generate functors (or lambdas) so I can keep them in a map and call them any time. Ony ob instance is different each time.
This is the simple sample I came up with, k is a double value I'm passing as a parameter for this sample, and it would be an object later.
grammar calculator;
start: expr EOF;
expr returns [std::function<double(double)> exprEval]
: left=expr op=('+'|'-') right=expr {$exprEval= [](double k)->double { return $left.exprEval(k) + $right.exprEval(k); }; }
| left=expr op=('*'|'/') right=expr {$exprEval= [](double k)->double { std::cout<<2*k<<std::endl; return -2*k; }; }
| '(' expr ')' {$exprEval= [](double k)->double { std::cout<<k<<std::endl; return -1*k; }; }
| numb {$exprEval= [](double k)->double { std::cout<<-1*k<<std::endl; return k; }; }
;
numb
:DOUBLE
|INT
;
INT
: [0-9]+
;
DOUBLE
: [0-9]+'.'[0-9]+
;
WS
: [ \r\n\t] + -> channel (HIDDEN)
;
It produces the following errors. I think I'm referencing them wrong.
error(67): calculator.g4:6:152: missing attribute access on rule reference left in $left
error(67): calculator.g4:6:172: missing attribute access on rule reference right in $right
following don't work either.
$left.ctx.exprEval(k) //compilation error : in lambda, localctx is not captured.
ctx.$left.exprEval(k) //compilation error : ctx was not declared in this scope
How do I access the "left" and "right" expression contexts from inside lambda?
Or isn't this the best approach? Is there a better way?
I think parsing the rules everytime is not a good idea since there are millions of records.
You probably could get away by adjusting the capture in your lambdas, but I strongly recommend to change your approach. Don't write all the code in your grammar, but instead create a listener (or visitor if you need to evaluate expressions) and implement all that there. Much easier to maintain and you avoid trouble like this.

How to pass token kind with its associated information from lexer to preprocessor, then to parser

I try to implement a simple C/C++ parser, which try to partially parsing C++ language. So I need to create a Lexer, a Preprocessor and a Parser class.
I'm considering what is the data type I need to pass information from those three layers. Normally, a Token class is need here, for right now, my Token class looks like below:
struct Token
{
TokenKind id;
std::string lexeme;
int fileIndex;
int line;
int column;
}
I think the most important part is the TokenKind(it could be IDENTIFIER or CLASS_KEYWORD or any other punctuation like LPAREN), and some times, the lexeme is also important, because it usually contains the type name or variable name information.
I looked at some implementations about how the Token is passed to Parsers.
1, I see the Clang has some functions in it's Preprocessor class like Preprocessor.cpp:739
void Preprocessor::Lex(Token &Result)
You see, a reference is passed as a the function argument, and the function fill the object with the result, see another reference here on a Clang's tutorial here:Clang-tutorial/CItutorial3.cpp at master · loarabia/Clang-tutorial, here the instance tok is reused in a loop.
Token tok;
do {
ci.getPreprocessor().Lex(tok);
if( ci.getDiagnostics().hasErrorOccurred())
break;
ci.getPreprocessor().DumpToken(tok);
std::cerr << std::endl;
} while ( tok.isNot(clang::tok::eof));
2, For some lexer generator, I see function yylex() just return an int type, which is actually a TokenKind, and the other information such as the actual lexeme string is stored in a global variables like yylval.
3, For a tiny language for GCC A tiny GCC front end – Part 3 | Think In Geek, I see the Lexer return a std::shared_ptr<Token>, that is:
static TokenPtr
make_identifier (location_t locus, const std::string& str)
{
return TokenPtr(new Token (IDENTIFIER, locus, str));
}
The Lexer return a TokenPtr which is a smart pointer of the Token object to the Parser, so the whole Token is returned to the Parser.
4, GCC's cpp library has some interface of the cpp_get_token() function like below:
const cpp_token *token = cpp_get_token (pfile);
Then token->type is just like the TokenKind field.
So, my question is: what are the advantages and disadvantages of those kinds of implementations. Some of the mentioned methods above do not even have a preprocess layer, for me, I do need three layers(the lexer, the preprocessor and the parser).
Note that my parser won't be big enough as clang or GCC's parser. My main idea is that my parser can only parse very limited part of C++ language, and I would like to make them all hand written.
EDIT A similar question is here What should be the datatype of the tokens a lexer returns to its parser?, I also post some comments there several days ago, but that question does not involve the three layers.

How replace If-else block condition

In my code I have an if-else block condition like this:
public String method (Info info) {
if (info.isSomeBooleanCondition) {
return "someString";
}
else if (info.isSomeOtherCondition) {
return "someOtherString";
}
else if (info.anotherCondition) {
return "anotherStringAgain";
}
else if (lastCondition) {
return "string ...";
}
else return "lastButNotLeastString";
}
Each conditional branch returns a String.
Since if-else statements are difficult to read, test and maintain, how can I replace?
I was thinking to use Chain Of Responsability Pattern, is it right in this case?
Is there any other elegant way that I can do that?
I am left to assume that your code does not exist in the Info class as it is passed in an referenced for all but that last condition. My first instinct would be to make String OtherClass.method(Info) into String Info.method() and have it return the appropriate string.
Next, I would take a look at the conditions. Are they really conditions or can they be mapped to a table. Whenever I see code performing a lookup, such as this, I tend to fall back on attempting to fit into a dictionary or map so I can perform a lookup for the value.
If you are left with conditions that must be checked then I would begin thinking about lambdas, delegates or custom interface. A series of if..then across the same type could easily be represented. Next, you would collect them and execute accordingly. IMO, this would make the if..then bunch much clearer. It is more code by is secondary at this point.
interface IInfoCheck
{
bool TryCheck(Info info, out string);
}
public OtherClass()
{
// Setup checks
CheckerCollection.add(new IInfoCheck{
public String check(out result) {
// check code
}
});
}
public String method(Info info) {
foreach (IInfoCheck ic in CheckerCollection)
{
String result = null;
if (ic.TryCheck(out result))
{
return result;
}
}
}
The problem statement does not fit into an ideal chain of responsibility scenario because it is either/or kind or conditions which look 'chained' but is actually 'not'. Reason - one processes all the chain-links in the chain of responsibility pattern irrespective of what happened in the previous links, i.e. no chain-links are skipped(although you can configure which chain links to process and which not - but still the execution of a chain-link is not dependent on the outcome of a previous chain-link). However, in this if-else-if* scenario - once an if statement condition matches, the further conditions are not evaluated.
I have thought of an alternative design which achieves the above without if-else, but it is lengthier but at the same time more flexible.
Lets say we have a FunctionalInterface IfElseReplacer which takes 'info' as input and gives 'String' output.
public Interface IfElseReplacer(){
public String executeCondition(Info);
}
Then the above conditions can be re-phrased as lambda expressions would look like -
"(Info info) -> info.someCondition ? someString"
"(Info info) -> info.anotherCondition ? someOtherString"
and so on...
Then we need a processConditons method to process these Lambdas- it could be a default method in ifElseReplacer -
default String processConditions(List<IfElseReplacer> ifElseReplacerList, Info info){
String strToReturn="lastButNotLeastString";
for(IfElseReplacer ifElseRep:ifElseReplacerList){
strToReturn=ifElseRep.executeCondition(info);
if(!"lastButNotLeastString".equals(strToReturn)){
break;//if strToReturn's value changes i.e. executeCondition returns a String valueother than "lastButNotLeastString" then exit the for loop
}
return strToReturn;
}
What remains now is to (I am skipping the code for this - please let me know if you need it then will write this also) -
From wherever the if-else conditions need to be checked there -
Create an array of lambda expressions as explained above assigning them to IfElseReplacer interfaces while adding them to a list of type IfElseReplacer.
Pass this list to the default method processConditions() along with an instance of Info.
Default method would return the String value which we would be same as the result of if-else-if* block given in the problem statement.
I'd simply factor out the returns:
return
info.isSomeBooleanCondition ? "someString" :
info.isSomeOtherCondition ? "someOtherString" :
info.anotherCondition ? "anotherStringAgain" :
lastCondition ? "string ..." :
"lastButNotLeastString"
;
From the limited information about the problem, and the code given, it looks like this a case of type-switching. The default solution would be to use a inheritance for that:
class Info {
public abstract String method();
};
class BooleanCondition extends Info {
public String method() {
return "something";
};
class SomeOther extends Info {
public String getString() {
return "somethingElse";
};
Patterns which are interesting in this case are Decorator, Strategy and Template Method. Chain of Responsibility has another focus. Each element in the chain implement logic to process some commands. When chained, an object forwards the command if it cannot process it. This implements a loosly coupled structure to process commands where no central dispatch is needed.
If computing the string on the conditions is an operation, and from the name of the class I am guessing that it is probably an expression tree, you should look at the Visitor pattern.

How to insert through bison actions OR return or pass an object to bison parser?

I have a GUI program which uses bison to do parsing of files (containing pin connection list of ICs).
The input file is like->
3 = NAND2(1, 2)
6 = NAND2(4, 5)
8 = NAND2(9, 10)
11 = NAND2(12, 13)
And (simplified) grammar is like->
start : spec { cout << "Parse Complete!"; }
;
spec : '{' expr_list '}'
;
expr_list : expr_list expr | expr
;
expr : INT '=' FUNC '(' args ')' { /*WANT TO INSERT ACTION HERE */ }
;
At the place mentioned I want to perform an action -- to insert parsed values in my container (a map). The file generated by bison is .cpp, but really it is C otherwise. I can include required header files easily. Also I can simply call non-member functions (like cout).
My question is- How can I insert values in a object (of container type)? I can create an object inside grammar file, but then how do I return it?
Other way to do is to create an object (of container type) in GUI code, and then pass this to parser.
Please tell how to achieve any of the method.
With return value optimisation, I'd be tempted just to return a map by value.
The solution is extern.
I created the (global) object in .cpp file of my GUI (where I invoke the parser) like:
QStringList *icNameList;
Then in header file (included by bison file)
extern QStringList *icNameList;
Finally in bison file, I can do something like:
abc : xyz { icNameList->append(getTokenString()); }
;