Cyclic Dependency in reentrant flex / bison headers with union YYSTYPE - header-files

I have a problem where I believe there is a cyclic dependency between the headers generated by flex and bison. The type yyscan_t is defined in the lex header and needed in the yacc header. The macro YYSTYPE is defined in the yacc header and needed in the lex header. No matter which order I import the two headers, the other will not be happy.
reentrant.lex:
%{
#include "reentrant.yacc.h"
%}
%option reentrant bison-bridge
%%
[0-9]+ { yylval->int_value = atoi(yytext); return INT_TERM; }
[a-zA-Z]+ { yylval->str_value = strdup(yytext); return STR_TERM; }
, { return COMMA; }
reentrant.yacc:
%{
// UN-COMMENT THE FOLLOWING LINE TO "FIX" THE CYCLE
//typedef void * yyscan_t
#include "reentrant.yacc.h"
#include "reentrant.lex.h"
void yyerror(yyscan_t scanner, char * msg);
%}
%define api.pure full
%lex-param {yyscan_t scanner}
%parse-param {yyscan_t scanner}
%union {
int int_value;
char * str_value;
}
%token <int_value> INT_TERM
%token <str_value> STR_TERM
%token COMMA
%type <int_value> int_non_term
%type <str_value> str_non_term
%%
complete : int_non_term str_non_term { printf(" === %d === %s === \n", $1, $2); }
int_non_term : INT_TERM COMMA { $$ = $1; }
str_non_term : STR_TERM COMMA { $$ = $1; }
%%
int main(void) {
yyscan_t scanner;
yylex_init(&scanner) ;
yyset_debug(1, scanner);
yydebug=1;
int val = yyparse(scanner);
yylex_destroy (scanner) ;
return val;
}
int yywrap (yyscan_t scanner) {
return 1;
}
void yyerror(yyscan_t scanner, char * msg) {
fprintf(stderr, msg);
}
GCC Output:
In file included from reentrant.yacc:5:0:
reentrant.yacc.h:74:14: error: unknown type name ‘yyscan_t’
int yyparse (yyscan_t scanner);
^~~~~~~~
Command Arguments Used:
bison -vt --debug --defines=reentrant.yacc.h -o reentrant.yacc.c reentrant.yacc
flex -8 -d --header-file=reentrant.lex.h -o reentrant.lex.c reentrant.lex
gcc -Wall -Wno-unused-function -g reentrant.lex.c reentrant.yacc.c -o reentrant
Software Versions:
$ flex --version
flex 2.6.4
$ bison --version
bison (GNU Bison) 3.0.4
Written by Robert Corbett and Richard Stallman.
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ gcc --version
gcc (GCC) 6.3.1 20170306
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
YYSTYPE:
Here you can see that YYSTYPE is defined in the yacc header and consumed in the lex header.
$ grep 'YYSTYPE' *.h
reentrant.lex.h:YYSTYPE * yyget_lval ( yyscan_t yyscanner );
reentrant.lex.h:void yyset_lval ( YYSTYPE * yylval_param , yyscan_t yyscanner );
reentrant.lex.h: (YYSTYPE * yylval_param , yyscan_t yyscanner);
reentrant.lex.h: (YYSTYPE * yylval_param , yyscan_t yyscanner)
reentrant.yacc.h:#if ! defined YYSTYPE && ! defined YYSTYPE_IS_DECLARED
reentrant.yacc.h:union YYSTYPE
reentrant.yacc.h:typedef union YYSTYPE YYSTYPE;
reentrant.yacc.h:# define YYSTYPE_IS_TRIVIAL 1
reentrant.yacc.h:# define YYSTYPE_IS_DECLARED 1
yyscan_t:
Here you can see that yyscan_t is defined in the lex header and consumed in the yacc header.
$ grep 'yyscan_t' *.h
reentrant.lex.h:typedef void* yyscan_t;
<snip lots of function decls including yyscan_t>
reentrant.yacc.h:int yyparse (yyscan_t scanner);

This is a non-answer, really, but I see that the question has not received the attention it should have so I am posting this as a depressing reminder of this serious flaw in bison's design.
It is not possible (at least in the current version of bison/flex) to cleanly include the appropriate header files. The reason is the structure of the *.h file produced by bison (which is exactly the same as what appears in place of the %union declaration): the union YYSTYPE {...} YYSTYPE; declaration is followed immediately by the declaration for int yyparse( yyscanner_t yyscanner); (unless you changed the prefix yy to something else). As there is no mechanism to insert the lexer definitions file produced by flex between the two declarations, no matter where the lexer definitions are included, the conflict is inevitable, whether or not the parser defines are included as well. Put the lexer *.h file before the parser one (or before the %union declaration) and gcc will complain about not knowing what YYSTYPE is. Include it after---and the compiler will not know what yyscanner_t means in the yyparse declaration.
Unless bison outputs different chunks of its %defines file separately, it is unclear how this can be resolved. A practical way to circumvent this problem is to include the parser defines first after defining yyscanner_t as void * (whether as a macro or a typedef, does not matter, both work and both are equally ugly), followed by the definitions file produced by flex.

Related

Partial preprocessing of C files with GCC (not removing "define" directives)

GCC can output a fully preprocessed C++ source file if I pass the -E or -save-temps command line argument.
My question is, can I somehow get a partially preprocessed C++ source file in which
a) code fragments not meeting #if, #ifdef and #ifndef conditions are eliminated,
b) #include directives are resolved (header files are included), etc
BUT
c) ordinary #define directives are NOT resolved?
(This would be necessary and really helpful because I would like to have the most compact and readable output possible.
Resolving the #if directives shortens the source code, but resolving #define directives makes the source less readable and more redundant.)
I have tried to create an example as compact as possible in order to demonstrate what I would like to achieve:
Sample input files:
// header1.h
#ifndef header1_h
#define header1_h
int function1(int val) {
return val + MYCONST;
}
#endif
// header2.h
#ifndef header2_h
#define header2_h
int function1(int val) {
return val + val + MYCONST;
}
#endif
// main.c
#define MYCONST 1234
#define SETTING1
#ifdef SETTING1
#include "header1.h"
#endif
#ifdef SETTING2
#include "header2.h"
#endif
int main(void) {
int retVal = function1(99);
}
Expected output:
// main.i (GCC preprocessing output)
#define MYCONST 1234 // I would like to see the definition of MYCONST here
#define SETTING1
#define header1_h
int function1(int val) {
return val + MYCONST; // I would like to see MYCONST here instead of the resolved value
}
int main(void) {
int retVal = function1(99);
}
gcc has an option -fdirectives only which does something close to what you want:
-fdirectives-only
When preprocessing, handle directives, but do not expand macros.
The option’s behavior depends on the -E and -fpreprocessed options.
With -E, preprocessing is limited to the handling of directives such as #define, #ifdef, and #error. Other preprocessor operations, such as macro expansion and trigraph conversion are not performed. In addition, the -dD option is implicitly enabled.
With -fpreprocessed, predefinition of command line and most builtin macros is disabled. Macros such as __LINE__, which are contextually dependent, are handled normally. This enables compilation of files previously preprocessed with -E -fdirectives-only.
With both -E and -fpreprocessed, the rules for -fpreprocessed take precedence. This enables full preprocessing of files previously preprocessed with -E -fdirectives-only.
In your case, it should be called
% gcc -fdirectives-only -E -o main.i main.c
but you get more defines (those internally defined), blank lines and #line lines than what you ask for.

cast from ‘SymbolInfo*’ to ‘YYSTYPE {aka int}’ loses precision

In my .l file, I have the following:
<some include files....>
#include "SymbolInfo.h"
#include "y.tab.h"
using namespace std;
int line_count = 1;
int TOTAL_ERROR = 0;
extern SymbolTable symbolTable;
extern FILE *yyin,*yyout;
extern YYSTYPE yylval;
<further declarations...>
%%
"int" {
SymbolInfo* s = new SymbolInfo(string(yytext),"INT");
yylval = (YYSTYPE)s;
return INT;
}
<more patterns....>
%%
In my .y file, I have the defined the YYSTYPE:
#include "SymbolInfo.h"
int SymbolTable::id = 0;
#define YYSTYPE SymbolInfo*
But when I try to compile it, it gives the following error:
Lexical Analyzer.l: In function ‘int yylex()’:
Lexical Analyzer.l:162:27: error: cast from ‘SymbolInfo*’ to ‘YYSTYPE {aka int}’ loses precision [-fpermissive]
yylval = (YYSTYPE)res;
^
My question is, why is it giving a compilation error even after defining YYSTYPE as SymbolInfo*? How can I handle this error ?
SymbolInfo Code
Parser Code
Lexer Code
why is it giving a compilation error even after defining YYSTYPE as SymbolInfo*?
Because, as you say, you are defining YYSTYPE in the parser definition. The error is occurring in the scanner definition. Those are separate files and macros defined in one will not be visible in the other.
You could put the #define YYSTYPE line in your .l file before you include the header file generated by bison. Or you could put it in a %code requires section in your bison file, so that it get inserted into the bison-generated header.
But what you should do is to avoid the macro and use a bison declaration:
%define api.value.type { SymbolInfo* }
That will not only correctly define the semantic type, it will also be placed into the bison-generated header file so that you don't have to worry about define YYSTYPE in other source files (as long as they #include the bison header file).
Please don't use bison's -y option. It's only appropriate for legacy code. New code should be written to the bison interface. Unless otherwise specified, this will bison (without the -y flag) will put the generated code in <name>.tab.c and the header file in <name>.tab.h. If you are generating C++, you probably want to specify the output filename with --output (or -o), and if necessary specify the header filename with --defines (instead of -d; --defines lets you specify a filename). See Bison options for details.

Make yylex return symbol_type instead of int

I'm trying to return symbol objects from yylex, as is shown in this documentation http://www.gnu.org/software/bison/manual/html_node/Complete-Symbols.html
However, when I compile, I find that return yy::parser::make_PLUS(); gets put into int yyFlexLexer::yylex(), so I get this error message (and many similar ones form other rules):
lexer.ll:22:10: error: no viable conversion from 'parser::symbol_type' (aka 'basic_symbol<yy::parser::by_type>') to 'int'
{ return yy::parser::make_PLUS(); }
What is the correct way to fix this?
lexer.ll
%{
#include "ASTNode.hpp"
// why isn't this in parser.tab.hh?
# ifndef YY_NULLPTR
# if defined __cplusplus && 201103L <= __cplusplus
# define YY_NULLPTR nullptr
# else
# define YY_NULLPTR 0
# endif
# endif
#include "parser.tab.hh"
#define yyterminate() return yy::parser::make_END()
%}
%option nodefault c++ noyywrap
%%
"+" { return yy::parser::make_PLUS(); }
"-" { return yy::parser::make_MINUS(); }
... more rules ...
%%
parser.yy
%{
#include "AstNode.hpp"
#include ...
static int yylex(yy::parser::semantic_type *arg);
%}
%skeleton "lalr1.cc"
%define api.token.constructor
%define api.value.type variant
%define parse.assert
%token END 0
%token PLUS
%token MINUS
%token ... many tokens ...
%type <ASTNode *> S statement_list ...
%%
S: statement_list
{ $$ = g_ast = (StatementList *)$1; }
;
... more rules ...
%%
static int yylex(yy::parser::semantic_type *arg) {
(void)arg;
static FlexLexer *flexLexer = new yyFlexLexer();
return flexLexer->yylex();
}
void yy::parser::error(const std::string &msg) {
std::cout << msg << std::endl;
exit(1);
}
You have to declare yylex, both in the generated scanner and in the generated parser, with the correct signature. Obviously, returning int is not what you want.
In the calc++ example included in the bison distribution (and described in the bison manual), you can see how to do this:
Then comes the declaration of the scanning function. Flex expects the signature of yylex to be defined in the macro YY_DECL, and the C++ parser expects it to be declared. We can factor both as follows.
// Tell Flex the lexer's prototype ...
# define YY_DECL \
yy::calcxx_parser::symbol_type yylex (calcxx_driver& driver)
// ... and declare it for the parser's sake.
YY_DECL;
That's just the normal way of changing the yylex declaration. Although the bison manual doesn't mention this, and the .ll suffix is arguably misleading, it is not using the C++ flex skeleton. It is using the C skeleton to generate a file which can be compiled with C++. As far as I can see, it is not even generating a reentrant lexer.
There is also an important option in the calc++.yy file:
The driver is passed by reference to the parser and to the scanner. This provides a simple but effective pure interface, not relying on global variables.
// The parsing context.
%param { calcxx_driver& driver }
That indicates that calcxx_driver& driver is an argument both to the parser and to the scanner. That is, you provide it to the parser, and the parser automatically passes it through to the scanner. That matches the yylex prototype generated with YY_DECL.
You might not actually need that object in your scanner actions. I don't think that its use is mandatory, but I have hardly ever used the C++ APIs in either bison or flex, so I could well be wrong.

Multiple definitions?

I'm having trouble with compiling my flex and bison code. more specifically my parser.yy file. In this file I included MathCalc.h and BaseProg.h, which are classes I've made. The issue is when I instantiate the classes, it gives me a "multiple definition" error on compilation. Any help would be appreciated! Thank you!!
Parser.yy (snippet):
%code requires {
#include <iostream>
#include <cmath>
#include "MathCalc.h"
#include "BaseProg.h"
/* Parser error reporting routine */
void yyerror(const char *msg);
/* Scannar routine defined by Flex */
int yylex();
using namespace std;
BaseProg bprog;
MathCalc calc;
enum Type { INT, FLT};
}
/* yylval union type */
%union {
double dval;
int ival;
char* name;
Type type;
}
error:
bison -d parser.yy
g++ -c -o scanner.o scanner.cc
g++ -c -o parser.tab.o parser.tab.cc
g++ scanner.o parser.tab.o BaseProg.o MathCalc.o -lfl -o ../Proj2
parser.tab.o:(.bss+0x0): multiple definition of `bprog'
scanner.o:(.bss+0x28): first defined here
parser.tab.o:(.bss+0x1): multiple definition of `calc'
scanner.o:(.bss+0x29): first defined here
collect2: ld returned 1 exit status
Any code in a %code requires block will be placed both in the parser source file and in the parser header file. It's normal to #include the parser header file in the scanner source (after all, that's why bison generates a header file), so it is unwise to put global variable definitions in a %code requires block.
Indeed, it is always unwise to put global variable definitions in a header file, precisely because the header file is likely to be included in more than one source file, with the result that any global definitions (as opposed to declarations) will be inserted into more than one translation unit, violating the ODR.
For the header file, you should mark these objects (BaseProg bprog; and MathCalc calc;) as extern, and then make sure you actually define them in some source file. Or, even better, you should avoid the use of globals in the first place.

Class as Return Type in Bison: Where to define the class?

I've found three tutorials that cover writing a parser in C++ using Bison: here, here, and here. The first two don't cover how to use a class as the return type of a rule. The third does cover it, but it doesn't appear to clearly explain the stipulations.
Here is a simple parser.yy file which uses a class in this manner.
%{
#include <stdio.h>
extern FILE * yyin;
int yyerror(char *s)
{
fflush(stdout);
printf("error\n");
}
int yywrap(void) {
return 1;
}
int yyparse() { return 1; }
int main() {
yyparse();
return 1;
}
class myclass { int x; };
%}
%union {
int token;
myclass * mc;
}
%token <token> REGISTER
%type <mc> start_sym
%start start_sym
%%
start_sym :
REGISTER '+' { $$ = new myclass(); }
;
Bison runs with no problems using this input. I defined a simple input for flex to use with this. However, when I try to compile it, I get the error:
$ g++ parser.tab.cc lex.yy.cc
In file included from lex.ll:11:0:
parser.yy:31:5: error: ‘myclass’ does not name a type
myclass * mc;
^
Where is the appropriate place to declare myclass?
The error is coming from your lex-generated file, and is the result of your not putting a definition for you class in a place where the lex-generated file can see it. Putting the definition in a separate header file and including that header file in both the scanner (flex) and parser (bison) files is the simplest solution; moreover, it makes sense because there are probably other compilation units -- the consumer of the parse -- which also require these headers.
If you are using a relatively modern version of bison, for the case of type definitions which really only need to be visible to the scanner and parser, and no other component, it is possible to get bison to insert definitions into the header file which it generates by putting a %code requires block into your prologue:
%code requires {
class myclass {
// ...
};
}
(You can also use a %code provides block, but normally you would want these definitions to be available when the semantic type union is defined, which is the location of %code requires.)
For more information, see the bison manual.
With respect to the extern "C" issue, I would have thought that to be unnecessary if you include the flex-generated header file in your bison input. You also might want to specify %option noyywrap in your flex input, unless you are actually using yywrap. The yywrap included in -lfl has "C" linkage, but as far as I know the extern "C" yywrap declaration does occur in the flex-generated header file.
Having said all that, I need to confess that I don't use the C++ APIs for either bison or flex. I prefer to just use the C interfaces; the resulting files can be compiled cleanly with a C++ compiler (at least with recent versions of the bison and flex tools), and you can use pointers to C++ objects in your semantic union without problems.
I've found that the only place that I can declare the class without getting a compilation error is in a separate header file ("myclass.h"). This header file must be included in both parser.yy and scanner.ll. In scanner.ll, it must be included before parser.tab.hh.
Additionally, the only way I've gotten it to compile without the compiler complaining about yyparse(), yylex(), or yywrap() is to include prototypes for these functions in the file myclass.h with external C linkage, like so:
#ifndef __MYCLASS_H_
#define __MYCLASS_H_
class myclass { int x; };
extern "C" {
int yyparse(void);
int yylex(void);
int yywrap(void);
}
#endif
If anyone is aware of a better solution, please do let me know.