Bison C++ mid-rule value lost with variants - c++

I'm using Bison with lalr1.cc skeleton to generate C++ parser and api.value.type variant. I tried to use mid-rule action to return value that will be used in further semantic actions, but it appears that the value on stack becomes zero. Below is an example.
parser.y:
%require "3.0"
%skeleton "lalr1.cc"
%defines
%define api.value.type variant
%code {
#include <iostream>
#include "parser.yy.hpp"
extern int yylex(yy::parser::semantic_type * yylval);
}
%token <int> NUM
%%
expr: NUM | expr { $<int>$ = 42; } '+' NUM { std::cout << $<int>2 << std::endl; };
%%
void yy::parser::error(const std::string & message){
std::cerr << message << std::endl;
}
int main(){
yy::parser p;
return p.parse();
}
lexer.flex:
%option noyywrap
%option nounput
%option noinput
%{
#include "parser.yy.hpp"
typedef yy::parser::token token;
#define YY_DECL int yylex(yy::parser::semantic_type * yylval)
%}
%%
[ \n\t]+
[0-9]+ {
yylval->build(atoi(yytext));
return token::NUM;
}
. {
return yytext[0];
}
compiling:
bison -o parser.yy.cpp parser.y
flex -o lexer.c lexer.flex
g++ parser.yy.cpp lexer.c -O2 -Wall -o parser
Simple input like 2+2 should print value 42 but instead appears 0. When variant type is changed to %union, the printed value is as it should be. For a workaround I've been using marker actions with $<type>-n to get deeper values from stack, but that approach can reduce legibility and maintainability.
I've read in the generated source that when using variant type, the default action { $$ = $1 } is not executed. Is it an example of that behavior or is it a bug?

I vote for "bug". It does not have to do with the absence of the default action.
I traced execution through to the point where it attempts to push the value of the MRA onto the stack. However, it does not get the correct type for the MRA, with the result that the call to yypush_ does nothing, which is clearly not the desired action.
If I were you, I'd report the problem.

This was indeed a bug in Bison 3.0.5, but using $<foo>1 is also intrinsically wrong (yet there was no alternative!).
Consider:
// 1.y
%nterm <int> exp
%%
exp: { $<int>$ = 42; } { $$ = $<int>1; }
It is translated into something like:
// 2.y
%nterm <int> exp
%%
#1: %empty { $<int>$ = 42; }
exp: #1 { $$ = $<int>1; }
What is wrong here is that we did not tell Bison what the type of the semantic value of #1 is, so it will not be able to handle this semantic value correctly (typically, no %printer and no %destructor will be applied). Worse yet: here, because it does not know what value is actually stored here, it is unable to manipulate it properly: it does not know how to copy it from the local variable $$ to the stack, that's why the value was lost. (In C, the bits of the semantic value are copied blindly, so it works. In C++ with Bison variants, we use the exact assignment/copy operators corresponding to the current type, which requires to know what the type is.)
Clearly a better translation would have been to have #1 be given a type:
// 3.y
%nterm <int> exp #1
%%
#1: %empty { $$ = 42; }
exp: #1 { $$ = $1; }
which is actually what anybody would have written.
Thanks to your question, Bison 3.1 now features typed midrule actions:
// 4.y
%nterm <int> exp
%%
exp: <int>{ $$ = 42; } { $$ = $1; }
Desugared, 4.y generates exactly as in 3.y.
See http://lists.gnu.org/archive/html/bug-bison/2017-06/msg00000.html and http://lists.gnu.org/archive/html/bug-bison/2018-06/msg00001.html for more details.

Related

How to count assignment operators in a text file?

My task is to create a program in C ++ that processes a text file in sequential mode. The data must be read from the file one line at a time. Do not back up the entire contents of the file to RAM. The text file contains syntactically correct C++ code and I have to count how many assignment operators are there.
The only thing I could think of was making a function that searches for patterns and then counts how many times they appear. I insert every assignment operator as a pattern and then sum all the counts together. But this does not work because if I insert the pattern "=" many operators such as "%=" or "+=" also get counted in. And even operators like "!=" or "==" get counted, but they shouldn't because they are comparison operators.
My code gives the answer 7 but the real answer should be 5.
#include <iostream>
#include <fstream>
using namespace std;
int patternCounting(string pattern, string text){
int x = pattern.size();
int y = text.size();
int rez = 0;
for(int i=0; i<=y-x; i++){
int j;
for(j=0; j<x; j++)
if(text[i+j] !=pattern[j]) break;
if(j==x) rez++;
}
return rez;
}
int main()
{
fstream file ("test.txt", ios::in);
string rinda;
int skaits=0;
if(!file){cout<<"Nav faila!"<<endl; return 47;}
while(file.good()){
getline(file, rinda);
skaits+=patternCounting("=",rinda);
skaits+=patternCounting("+=",rinda);
skaits+=patternCounting("*=",rinda);
skaits+=patternCounting("-=",rinda);
skaits+=patternCounting("/=",rinda);
skaits+=patternCounting("%=",rinda);
}
cout<<skaits<<endl;
return 0;
}
Contents of the text file:
#include <iostream>
using namespace std;
int main()
{
int z=3;
int x=4;
for(int i=3; i<3; i++){
int f+=x;
float g%=3;
}
}
Note that as a torture test, the following code has 0 assignments on older C++ standards and one on newer ones, due to the abolition of trigraphs.
// = Torture test
int a = 0; int b = 1;
int main()
{
// The next line is part of this comment until C++17 ??/
a = b;
struct S
{
virtual void foo() = 0;
void foo(int, int x = 1);
S& operator=(const S&) = delete;
int m = '==';
char c = '=';
};
const char* s = [=]{return "=";}();
sizeof(a = b);
decltype(a = b) c(a);
}
There are multiple issues with the code.
The first, rather mundane issue, is your handling of file reading. A loop such as while (file.good()) … is virtually always an error: you need to test the return value of getline instead!
std::string line;
while (getline(file, line)) {
// Process `line` here.
}
Next, your patternCounting function fundamentally won’t work since it doesn’t account for comments and strings (nor any of C++’s other peculiarities, but these seem to be out of scope for your assignment). It also doesn’t really make sense to count different assignment operators separately.
The third issue is that your test case misses lots of edge cases (and is invalid C++). Here’s a better test case that (I think) exercises all interesting edge cases from your assignment:
int main()
{
int z=3; // 1
int x=4; // 2
// comment with = in it
"string with = in it";
float f = 3; // 3
f = f /= 4; // 4, 5
for (int i=3; i != 3; i++) { // 6
int f=x += z; // 7, 8
bool g=3 == 4; // 9
}
}
I’ve annotated each line with a comment indicating up to how many occurrences we should have counted by now.
Now that we have a test case, we can start implementing the actual counting logic. Note that, for readability, function names generally follow the pattern “verb subject”. So instead of patternCounting a better function name would be countPattern. But we won’t count arbitrary patterns, we will count assignments. So I’ll use countAssignments (or, using my preferred C++ naming convention: count_assignments).
Now, what does this function need to do?
It needs to count assignments (incl. initialisations), duh.
It needs to discount occurrences of = that are not assignments:
inside strings
inside comments
inside comparison operators
Without a dedicated C++ parser, that’s a rather tall order! You will need to implement a rudimentary lexical analyser (short: lexer) for C++.
First off, you will need to represent each of the situations we care about with its own state:
enum class state {
start,
comment,
string,
comparison
};
With this in hand, we can start writing the outline of the count_assignments function:
int count_assignments(std::string const& str) {
auto count = 0;
auto state = state::start;
auto prev_char = '\0';
for (auto c : str) {
switch (state) {
case state::start:
break;
case state::comment:
break;
case state::string:
break;
case state::comparison:
break;
}
prev_char = c;
}
// Useful for debugging:
// std::cerr << count << "\t" << str << "\n";
return count;
}
As you can see, we iterate over the characters of the string (for (c : str)). Next, we handle each state we could be currently in.
The prev_char is necessary because some of our lexical tokens are more than one character in length (e.g. comments start by //, but /= is an assignment that we want to count!). This is a bit of a hack — for a real lexer I would split such cases into distinct states.
So much for the function skeleton. Now we need to implement the actual logic — i.e. we need to decide what to do depending on the current (and previous) character and the current state.
To get you started, here’s the case state::start:
switch (c) {
case '=':
++count;
state = state::comparison;
break;
case '<': case '>': case '!':
state = state::comparison;
break;
case '"' :
state = state::string;
break;
case '/' :
if (prev_char == '/') {
state = state::comment;
}
break;
}
Be very careful: the above will over-count the comparison ==, so we will need to adjust that count once we’re inside case state::comparison and see that the current and previous character are both =.
I’ll let you take a stab at the rest of the implementation.
Note that, unlike your initial attempt, this implementation doesn’t distinguish the separate assignment operators (=, +=, etc.) because there’s no need to do so: they’re all counted automatically.
The clang compiler has a feature to dump the syntax tree (also called AST). If you have syntactically correct C++ code (which you don't have), you can count the number of assignment operators for example with the following command line (on a unixoid OS):
clang++ -Xclang -ast-dump -c my_cpp_file.cpp | egrep "BinaryOperator.*'='" | wc -l
Note however that this will only match real assigments, not copy initializations, which also can use the = character, but are something syntactically different (for example an overloaded = operator is not called in that case).
If you want to count the compound assignments and/or the copy initializations as well, you can try to look for the corresponding lines in the output AST and add them to the egrep search pattern.
In practice, your task is incredibly difficult.
Think for example of C++ raw string literals (you could have one spanning dozen of source lines, with arbitrary = inside them). Or of asm statements doing some addition....
Think also of increment operators like (for some declared int x;) a x++ (which is equivalent to x = x+1; for a simple variable, and semantically is an assignment operator - but not syntactically).
My suggestion: choose one open source C++ compiler. I happen to know GCC internals.
With GCC, you can write your own GCC plugin which would count the number of Gimple assignments.
Think also of Quine programs coded in C++...
NB: budget months of work.

How to declare a persistent variable when using left recursion in bison/flex?

Here is the part I'm talking about:
block : statement
{
NBlock* myBlock = new NBlock();
myBlock->AddStatement($1);
}
| block statement
{
std::cout << "More than one statement" << std::endl;
myBlock->AddStatement($2);
}
;
Here is an excerpt from the instructions for this assignment:
The majority of grammar actions will only require setting $$ to a new instance of the node, as
with NRotate. But there are a couple of special cases to watch out for.
The main_loop action simply needs to grab the block it has and set the global g_MainBlock
pointer to it. This global block should then also have SetMainBlock called on it.
The other special case is the actions for block. When the first statement is matched, you want to
construct a new NBlock, and add the statement to this new NBlock’s list of statements. But when
subsequent statements are matched, rather than creating a new NBlock, you should simply add
the new statement to the already existing NBlock.
How do I achieve this?
ETA:
/* Add one union member for each Node* type */
%union {
Node* node;
NBlock* block;
NStatement* statement;
NNumeric* numeric;
NBoolean* boolean;
std::string* string;
int token;
}
%error-verbose
/* Terminal symbols */
%token <string> TINTEGER
%token <token> TLBRACE TRBRACE TSEMI TLPAREN TRPAREN
%token <token> TMAIN TROTATE TFORWARD TISHUMAN TATTACK TISPASSABLE TISRANDOM TISZOMBIE TRANGED
%token <token> TIF TELSE
/* Statements */
%type <block> main_loop block
%type <statement> statement rotate forward is_human is_passable is_random is_zombie ranged
/* Expressions */
%type <numeric> numeric
You should be able to simply find the original object in $1, and move it into $$, instead of creating a new one.
The first statement sets $$ to myBlock. The recursive statement grabs it from $1, and sets $$ to it.
P.S. You should use smart pointers, i.e. std::shared_ptr, in order to avoid leaking memory due to parsing failures.
Usually you would write this as
block : statement
{
$$ = new NBlock();
$$->AddStatement($1);
}
| block statement
{
std::cout << "More than one statement" << std::endl;
($$ = $1)->AddStatement($2);
}
;
This uses the sematic value of block to pass the created NBlock object between productions.

„Erratic“ grammar parsing by JavaCC for the IfStatement?

Starting from the Java 1.1 grammar I have tried to write a simplified grammar for a new language. Trying to parse IfStatements in various shapes I have encountered a strange problem which I don’t understand and for which I can’t find a solution.
I have written two test programs, the first containing:
… if (a > b) { x = 15; } else { x = 30; }
and the second:
… if (a > b) x = 15; else x = 30;
The first program is parsed without any problems while for the second the parser stops with the message “Encountered else at (start of ELSE) expecting one of: boolean | break | byte | char | continue ... “. According to the grammar (or rather to my intention ) there shouldn’t be any difference between a block or a single statement, and to my astonishment this perfectly works for WHILE or DO but not for IF.
The relevant pieces of my grammar are as follows:
void Statement() :
{}
{
( LOOKAHEAD(2)
Block() | EmptyStatement() | IfStatement() |
WhileStatement() | ...
}
void Block() :
{}
{ "{" ( BlockStatement() )* "}"
}
void BlockStatement() :
{}
{
( LOOKAHEAD(2) LocalVariableDeclaration() ";" |
Statement()
)
}
void EmptyStatement() :
{}
{ ";" }
void IfStatement() :
{}
{
"if" "(" Expression() ")" Statement()
[ LOOKAHEAD(1) "else" Statement() ]
}
void WhileStatement() :
{}
{
"while" "(" Expression() ")" Statement()
}
There is a second observation possibly connected with the above problem: As stated before, the first test program is parsed correctly. However, for the “Statement” in the (partial) grammar rule <"if" "(" Expression() ")" Statement()> in the IF statement the parser yields an overly complex sequence of “Statement, Block, BlockStatement, Statement” nodes including an EmptyStatement before and after the statement content while parsing a test program containing an equivalent WHILE loop just produces “Statement” and no EmptyStatements at all.
So far I have tried to increase the number of lookahead elements in the IfStatement and/or the Statement productions but without any effect.
What is going wrong here, or what is wrong in my grammar?

Can't find my mistake! error: expected identifier before '(' token

This is my main code, I did search related mistakes before asking but it just doesn't seem wrong...The IDE says the error is in line 11.
#include<stdio.h>
int main()
{
float sal;
printf("Digite o salário bruto: ");
scanf("%f",&sal);
if(sal<=2246.75){
printf("Salário líquido : ",sal);
}
else{
if(sal>2246.75)&&(sal<2995.70){
printf("Salário Líquido: ",sal * 0.925);
}
else{
if(sal>2995.70)&&(sal<=3743.19){
printf("Salário Líquido: ",sal * 0.845);
}
else{
printf("Salário Líquido: ", sal * 0.775);
return 0;
}
}
}
}
if(sal>2246.75)&&(sal<2995.70){
The problem is that the entire condition must be placed within a set of parentheses.
It's fine if you want to further enclose the sub-conditions, but you must surround the entire lot, too:
if ((sal > 2246.75) && (sal < 2995.70)) {
Your if statement has to correct as follows, here you were missing bracket() for if.
if( (sal>2246.75)&& (sal<2995.70)){
You have to specify the formatter for printf correctly as follows; here you are missing type formatter.
printf("Salário Líquido: %f", sal * 0.775);
Both these errors are there in multiple occasions in your code.
there are actually two major kinds of problems with the posted code.
printf("Salário líquido : ",sal);
is missing a format specifier for the 'sal' variable
it should be:
printf("Salario liquido : %f", sal);
Note: each of the printf() statements have this same problem
if(sal>2246.75)&&(sal<2995.70){
is missing the outside parens
it should be:
if( (sal>2246.75) && (sal<2995.70) ) {
Note: I added some horizontal spacing for clarity only
the last two 'if' statements have this same problem
Suggest compiling with all warnings enabled.
For gcc, at a minimum, use '-Wall -Wextra -pedantic'
main always returns an 'int'
To avoid that return code being a random value, always end the function with:
return(0);
I think that if(sal>2246.75)&&(sal<2995.70) is supposed to be
if(sal>2246.75 && sal<2995.70).

ANTLR - Keep a block unchanged

I am a beginner with ANTLR, and I need to modify an existing - and complex - grammar.
I want to create a rule to keep a block without parsing with other rules.
To be more clear, I need to insert a code wrote in c++ into the interpreted code.
Edit 11/02/2013
After many tests, here is my grammar, my test, the result I have, and the result and want:
Grammar
cppLiteral
: cppBegin cppInnerTerm cppEnd
;
cppBegin
: '//$CPP_IN$'
;
cppEnd
: '//$CPP_OUT$'
;
cppInnerTerm
: ( ~('//$CPP_OUT$') )*
;
Test
//$CPP_IN$
txt1 txt2
//$CPP_OUT$
Result
cppLiteral ->
cppBegin = '//$CPP_IN$'
cppInnerTerm = 'txt1' 'txt2'
cppEnd = '//$CPP_OUT$'
Expected result
cppLiteral ->
cppBegin = '//$CPP_IN$'
cppInnerTerm = 'txt1 txt2'
cppEnd = '//$CPP_OUT$'
(Sorry, I can't post the image of the AST because I don't have 10 reputations)
The three tokens "cppBegin", "cppInnerTerm" and "cppEnd" can be in one token, like this:
cppLiteral
: '//$CPP_IN$'( ~('//$CPP_OUT$') )*'//$CPP_OUT$'
;
to have this result:
cppLiteral = '//$CPP_IN$\n txt1 txt2\n //$CPP_OUT$'
I want to create a rule to keep a block without parsing with other rules.
Parse it like a multiline comment, e.g. /* foobar */. Below is a small example using the keywords specified in your question.
Note that most of the work is done with lexer rules (those that start with a capital letter). Any time you want to deal with blocks of text, particularly if you want to avoid other rules as in this case, you're probably thinking in terms of lexer rules rather than parser rules.
CppBlock.g
grammar CppBlock;
document: CPP_LITERAL* EOF;
fragment CPP_IN:'//$CPP_IN$';
fragment CPP_OUT:'//$CPP_OUT$';
CPP_LITERAL: CPP_IN .* CPP_OUT
{
String t = getText();
t = t.substring(10, t.length() - 11); //10 = length of CPP_IN, 11 = length of CPP_OUT
setText(t);
}
;
WS: (' '|'\t'|'\f'|'\r'|'\n')+ {skip();};
Here is a simple test case:
Input
//$CPP_IN$
static const int x = 0; //magic number
int *y; //$CPP_IN$ <-- junk comment
static void foo(); //forward decl...
//$CPP_OUT$
//$CPP_IN$
//Here is another block of CPP code...
const char* msg = ":D";
//The end.
//$CPP_OUT$
Output Tokens
[CPP_LITERAL :
static const int x = 0; //magic number
int *y; //$CPP_IN$ <-- junk comment
static void foo(); //forward decl...
]
[CPP_LITERAL :
//Here is another block of CPP code...
const char* msg = ":D";
//The end.
]
Rule CPP_LITERAL preserves newlines at the beginning and end of the input (after //$CPP_IN$ and before //$CPP_OUT$). If you don't want those, just update the action to strip them out. Otherwise, I think this grammar does what you're asking for.