I'm working with Flex and Bison in C++. I am learning to use these tools and the best way to start is by performing a simple calculator. Once generated the application (the executable) from my calc.y and calc.l files, I can run the .exe file and use it, but now I want to include it in a file c ++ to use it in my application but I can't. I think it's my fault because I'm including bad the generated file or generating bad code to import.
main.cpp
#include <iostream>
extern "C" {
#include "y.tab.h"
}
int main ( int argc, char *argv[] ) {
yyparse();
printf(elementos);
return 0;
}
calc.l
%{
#include "y.tab.h"
#include <stdlib.h>
void yyerror(char *);
%}
%%
[0-9]+ {
yylval = atoi(yytext);
return INTEGER;
}
[-+()\n] {
return *yytext;
}
[ \t] ;
. {
yyerror("Invalid character.");
}
%%
int yywrap(void) {
return 1;
}
calc.y
%{
#include <stdio.h>
int yylex(void);
void yyerror(char *);
int sym[26];
int elementos = 0;
%}
%token INTEGER VARIABLE
%left '+' '-'
%left '*' '/'
%%
program:
program expr '\n' { printf("%d\n", $2 ); }
|
;
statement:
expr { printf("%d\n", $1); }
| VARIABLE '=' expr { sym[$1] = $3; }
;
expr:
INTEGER { $$ = $1; }
| expr '+' expr { $$ = $1 + $3; elementos = elementos + 1;}
| expr '-' expr { $$ = $1 - $3; }
| expr '*' expr { $$ = $1 * $3; }
| expr '/' expr { $$ = $1 / $3; }
| '(' expr ')' { $$ = $2; }
;
%%
void yyerror(char *s) {
fprintf(stderr, "%s\n", s);
}
int main(void) {
yyparse();
return 0;
}
The y.tab.h is generated by bison. When I'm trying to compile main.cpp I'm getting an error:
command: gcc main.cpp -o main.exe
result: main.cpp: In function 'int main(int, char**)':
main.cpp:8:10: error: 'yyparse' was not declared in this scope
main.cpp:9:9: error: 'elementos' was not declared in this scope
How can I fix it?
I'm using gcc version 4.7.2, bison 2.4.1 and 2.5.4 on windows 8.1.
Thanks!
EDIT:
The y.tab.h file is:
/* Tokens. */
#ifndef YYTOKENTYPE
# define YYTOKENTYPE
/* Put the tokens into the symbol table, so that GDB and other debuggers
know about them. */
enum yytokentype {
INTEGER = 258,
VARIABLE = 259
};
#endif
/* Tokens. */
#define INTEGER 258
#define VARIABLE 259
#if ! defined YYSTYPE && ! defined YYSTYPE_IS_DECLARED
typedef int YYSTYPE;
# define YYSTYPE_IS_TRIVIAL 1
# define yystype YYSTYPE /* obsolescent; will be withdrawn */
# define YYSTYPE_IS_DECLARED 1
#endif
extern YYSTYPE yylval;
There's not a "elementos" variable, but looking in the generated y.tab.c file, I found that there is defined!
You have a number of problems:
Bison and Flex generate C code, which you then need to compile and link with your program. Your question shows no indication that you have done this.
If you want to be able to use the elementos variable in your main.cpp file, then you need to declare it. It may be defined somewhere else, but the compiler doesn't know that when it compiles main.cpp. Add this line inside the extern "C" part: extern int elementos;
You have two different main functions.
In main.cpp, you #include iostream, but then use printf from stdio.
The call to printf is wrong. It needs a format string.
Bison shows several warnings, which you probably need to read and do something about if you want your program to work.
Related
I'm trying to write my toy language with flex/bison tool chain in c++14.
I'm confused when using bison c++ variant with flex reentrant, yylex cannot find the parameter yylval.
My developing environment is the macbook with latest OS and XCode, homebrew installed latest flex 2.6.4 and bison 3.7.1.
For convience, you could download the project with error here: https://github.com/linrongbin16/tree.
Now let me introduce this not-so-simple tree project:
First let's see the makefile
clean:
rm *.o *.out *.yy.cc *.yy.hh *.tab.cc *.tab.hh *.output
tree.out: tree.o token.yy.o parser.tab.o
clang++ -std=c++14 -o tree.out tree.o token.yy.o parser.tab.o
token.yy.cc token.yy.hh: token.l
flex --debug -o token.yy.cc --header-file=token.yy.hh token.l
parser.tab.cc parser.tab.hh: parser.y
bison --debug --verbose -Wcounterexamples -o parser.tab.cc --defines=parser.tab.hh parser.y
token.yy.o: token.yy.cc
clang++ -std=c++14 -g -c token.yy.cc token.yy.hh
parser.tab.o: parser.tab.cc
clang++ -std=c++14 -g -c parser.tab.cc parser.tab.hh
tree.o: tree.cpp parser.tab.hh token.yy.hh
clang++ -std=c++14 -g -c tree.cpp
The application is a tree.out, which depends on 3 components: tree token and parser.
tree component
tree.h defines a simple abstract syntax tree class, since I didn't implement it, it has only one virtual destructor:
#pragma once
struct Tree {
virtual ~Tree() = default;
};
tree.cpp is the main function, which read a filename from stdin and initialize lexer and parser, and do the parsing:
#include "parser.tab.hh"
#include "token.yy.hh"
#include <cstdio>
#include <cstdlib>
struct Scanner {
yyscan_t yyscanner;
FILE *fp;
YY_BUFFER_STATE yyBufferState;
Scanner(const char *fileName) {
yylex_init_extra(this, &yyscanner);
fp = std::fopen(fileName, "r");
if (!fp) {
printf("file %s cannot open!\n", fileName);
exit(-1);
}
yyBufferState = yy_create_buffer(fp, YY_BUF_SIZE, yyscanner);
yy_switch_to_buffer(yyBufferState, yyscanner);
yyset_lineno(1, yyscanner);
}
virtual ~Scanner() {
if (yyBufferState) {
yy_delete_buffer(yyBufferState, yyscanner);
}
if (yyscanner) {
yylex_destroy(yyscanner);
}
if (fp) {
std::fclose(fp);
}
}
};
int main(int argc, char **argv) {
if (argc != 2) {
printf("missing file name!\n");
return -1;
}
Scanner scanner(argv[1]);
yy::parser parser(scanner.yyscanner);
if (parser.parse() != 0) {
printf("parsing failed!\n");
return -1;
}
return 0;
}
The important thing is that, I use bison c++ variant and flex reentrant feature, I want to make the project modern (with c++ 14) and safe with multiple threading. So it's a little complex when initializing. But it's worthy when project expand to a big one.
lexer component
token.l:
%option noyywrap noinput nounput
%option nodefault
%option nounistd
%option reentrant
%{
#include <cstdio>
#include <cstring>
#include "parser.tab.hh"
%}
%%
"+" { yylval->emplace<int>(yy::parser::token::PLUS); return yy::parser::token::PLUS; }
"-" { yylval->emplace<int>(yy::parser::token::MINUS); return yy::parser::token::MINUS; }
"*" { yylval->emplace<int>(yy::parser::token::TIMES); return yy::parser::token::TIMES; }
"/" { yylval->emplace<int>(yy::parser::token::DIVIDE); return yy::parser::token::DIVIDE; }
"(" { yylval->emplace<int>(yy::parser::token::LPAREN); return yy::parser::token::LPAREN; }
")" { yylval->emplace<int>(yy::parser::token::RPAREN); return yy::parser::token::RPAREN; }
";" { yylval->emplace<int>(yy::parser::token::SEMICOLON); return yy::parser::token::SEMICOLON; }
"=" { yylval->emplace<int>(yy::parser::token::EQUAL); return yy::parser::token::EQUAL; }
[a-zA-Z][a-zA-Z0-9]+ { yylval->emplace<std::string>(yytext); return yy::parser::token::ID; }
[0-9]+ { yylval->emplace<int>(atoi(yytext)); return yy::parser::token::NUM; }
%%
Here I followed bison split symbol manual (NOTICE: here we got the compiling error, I also tried the make_XXX api, which also gives me error).
It generates token.yy.cc token.yy.hh, expect to compile a token.yy.o object.
parser component
parser.y:
%require "3.2"
%language "c++"
%define api.value.type variant
%define api.token.constructor
%define parse.assert
%define parse.error verbose
%define parse.lac full
%locations
%param {yyscan_t yyscanner}
%code top {
#include <memory>
}
%code requires {
#include <memory>
#include "token.yy.hh"
#include "tree.h"
#define SP_NULL (std::shared<Tree>(nullptr))
}
%token<int> PLUS '+'
%token<int> MINUS '-'
%token<int> TIMES '*'
%token<int> DIVIDE '/'
%token<int> SEMICOLON ';'
%token<int> EQUAL '='
%token<int> LPAREN '('
%token<int> RPAREN ')'
%token<int> NUM
%token<std::string> ID
%type<std::shared_ptr<Tree>> prog assign expr literal
/* operator precedence */
%right EQUAL
%left PLUS MINUS
%left TIMES DIVIDE
%start prog
%%
prog : assign { $$ = SP_NULL; }
| prog ';' assign { $$ = SP_NULL }
;
assign : ID '=' expr { $$ = SP_NULL; }
| expr { $$ = $1; }
;
expr : literal { $$ = SP_NULL; }
| expr '+' literal { $$ = SP_NULL; }
| expr '-' literal { $$ = SP_NULL; }
| expr '*' literal { $$ = SP_NULL; }
| expr '/' literal { $$ = SP_NULL; }
;
literal : ID { $$ = SP_NULL; }
| NUM { $$ = SP_NULL; }
;
%%
I followed the bison c++ variant manual, it generates parser.tab.cc parser.tab.hh parser.output, the output file is just for analysis.
Since flex is reentrant, I need to add a parameter %param {yyscan_t yyscanner}.
error message
Here's the error message when making with make tree.out:
bison --debug --verbose -Wcounterexamples -o parser.tab.cc --defines=parser.tab.hh parser.y
flex --debug -o token.yy.cc --header-file=token.yy.hh token.l
clang++ -std=c++14 -g -c tree.cpp
clang++ -std=c++14 -g -c token.yy.cc token.yy.hh
token.yy.cc:820:10: error: use of undeclared identifier 'yyin'; did you mean 'yyg'?
if ( ! yyin )
^~~~
yyg
token.yy.cc:807:23: note: 'yyg' declared here
struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
^
token.yy.cc:822:4: error: use of undeclared identifier 'yyin'
yyin = stdin;
^
token.yy.cc:827:10: error: use of undeclared identifier 'yyout'
if ( ! yyout )
^
token.yy.cc:829:4: error: use of undeclared identifier 'yyout'
yyout = stdout;
^
token.yy.cc:837:23: error: use of undeclared identifier 'yyin'
yy_create_buffer( yyin, YY_BUF_SIZE , yyscanner);
^
token.yy.cc:895:3: error: use of undeclared identifier 'YY_DO_BEFORE_ACTION'
YY_DO_BEFORE_ACTION;
^
token.yy.cc:902:8: error: use of undeclared identifier 'yy_flex_debug'; did you mean 'yyget_debug'?
if ( yy_flex_debug )
^~~~~~~~~~~~~
yyget_debug
token.yy.cc:598:5: note: 'yyget_debug' declared here
int yyget_debug ( yyscan_t yyscanner );
^
token.yy.cc:908:45: error: use of undeclared identifier 'yytext'
(long)yy_rule_linenum[yy_act], yytext );
^
token.yy.cc:911:14: error: use of undeclared identifier 'yytext'
yytext );
^
token.l:12:3: error: use of undeclared identifier 'yylval'
{ yylval->emplace<int>(yy::parser::token::PLUS); return yy::parser::token::PLUS; }
^
token.l:13:3: error: use of undeclared identifier 'yylval'
{ yylval->emplace<int>(yy::parser::token::MINUS); return yy::parser::token::MINUS; }
^
token.l:14:3: error: use of undeclared identifier 'yylval'
{ yylval->emplace<int>(yy::parser::token::TIMES); return yy::parser::token::TIMES; }
^
token.l:15:3: error: use of undeclared identifier 'yylval'
{ yylval->emplace<int>(yy::parser::token::DIVIDE); return yy::parser::token::DIVIDE; }
^
token.l:16:3: error: use of undeclared identifier 'yylval'
{ yylval->emplace<int>(yy::parser::token::LPAREN); return yy::parser::token::LPAREN; }
^
token.l:17:3: error: use of undeclared identifier 'yylval'
{ yylval->emplace<int>(yy::parser::token::RPAREN); return yy::parser::token::RPAREN; }
^
token.l:18:3: error: use of undeclared identifier 'yylval'
{ yylval->emplace<int>(yy::parser::token::SEMICOLON); return yy::parser::token::SEMICOLON; }
^
token.l:19:3: error: use of undeclared identifier 'yylval'
{ yylval->emplace<int>(yy::parser::token::EQUAL); return yy::parser::token::EQUAL; }
^
token.l:21:3: error: use of undeclared identifier 'yylval'
{ yylval->emplace<std::string>(yytext); return yy::parser::token::ID; }
^
token.l:21:32: error: use of undeclared identifier 'yytext'
{ yylval->emplace<std::string>(yytext); return yy::parser::token::ID; }
^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
make: *** [token.yy.o] Error 1
Would you please help me solve these issues ?
Well, I read bison manual again and solve the issue myself...
Here in bison c++ example, we could see the yylex declaration is redefined:
// Give Flex the prototype of yylex we want ...
# define YY_DECL \
yy::parser::symbol_type yylex (driver& drv)
// ... and declare it for the parser's sake.
YY_DECL;
That's why we could write some like below in flex rule:
return yy::parser::make_MINUS (loc);
This is for a homework assignment. The only code I've edited myself are the definitions, rules, and tokens. What I have so far compiles successfully but gives me a segmentation fault when I try to run it on the markdown file (.md), and the HTML output is just a blank file because of that.
%{
#define YYSTYPE char *
#include <string.h>
#include "miniMD2html.tab.h"
extern YYSTYPE yylval;
%}
%option yylineno
/* Flex definitions */
whitespace [ \t]+
newline [\n]+|{whitespace}[\n]+
textword [a-zA-Z:/.\-,\']+
integer [0-9]+
header #|##|###|####|#####
%%
{header} { return T_HEADER; }
{integer} { return T_INTEGER; }
{textword} { return T_TEXTWORD; }
{whitespace} { return T_BLANK; }
{newline} { return T_NEWLINE; }
%%
The generate functions are given in another file. Most of them just accept char*, the generate_header function takes an int and char*, and the generate_image function takes two char* and two int. The grammar may look weird but this is what was given in the assignment.
%{
#include "global.h"
#include "stdlib.h"
#include "stdio.h"
#define YYSTYPE char *
extern int yylex();
int yywrap();
int yyerror(const char*);
int yyparse();
extern FILE *yyin;
Html_Doc *html_doc;
%}
/* Define tokens here */
%token T_BLANK T_NEWLINE
%token T_HEADER T_INTEGER T_TEXTWORD
%% /* Grammar rules and actions follow */
s: mddoc;
mddoc: /*empty*/ | mddoc paragraph;
paragraph: T_NEWLINE {add_linebreak(html_doc);}
| pcontent T_NEWLINE {add_element(html_doc, $1); free($1);} ;
pcontent: header
| rftext {generate_paragraph($1);}
header: T_HEADER T_BLANK rftext {generate_header(strlen($1), $3);}
rftext: rftext T_BLANK rftextword {strappend($1, $3);}
| rftext rftextword {strappend($1, $2);}
| rftextword
rftextword: textnum | image | format
image: "![" text "](" text '=' T_INTEGER '#' T_INTEGER ')' {generate_image($2, $4, atoi($6), atoi($8));}
format: "**" text "**" {generate_bold($2);}
| '_' text '_' {generate_italic($2);}
| "**" format "**" {generate_bold($2);}
| '_' format '_' {generate_italic($2);}
text: text T_BLANK textnum {strappend($1, $3);}
| text textnum {strappend($1, $2);}
| textnum
textnum: T_TEXTWORD | T_INTEGER
%%
int main(int argc, char *argv[]) {
// yydebug = 1;
FILE *fconfig = fopen(argv[1], "r");
// make sure it is valid
if (!fconfig) {
printf("Error reading file!\n");
return -1;
}
html_doc = new_html_doc();
// set lex to read from file
yyin = fconfig;
int ret = yyparse();
output_result(html_doc);
del_html_doc(html_doc);
return ret;
}
int yywrap(){
return 1;
}
int yyerror(const char* s){
extern int yylineno;
extern char *yytext;
printf("error while parsing line %d: %s at '%s', ASCII code: %d\n", yylineno, s, yytext, (int)(*yytext));
return 1;
}
None of your flex rules ever set the value of yylval, so it will be NULL throughout. And so will all the references to semantic values ($n) in your grammar. Since most functions which take a char* assume that it is a valid string, it's pretty likely that one of them will soon try to examine the string value, and the fact that the pointer is NULL will certainly lead to a segfault.
In addition, there are both single character and quoted string tokens in your grammar, none of which can be produced by your scanner. So it's quite likely that the parser will stop with a syntax error as soon as one of the non-word characters is encountered in the input.
In the bison file, every token should be separated by ;
s: mddoc;
mddoc: /*empty*/ | mddoc paragraph;
paragraph: ...
Notice the
;
after mmdoc paragraph.
This is correct but the following tokens are not separated well.
Also, as #Rockcat as said, in the flex file, you should add
yylval = strdup(yytext);
before returning your token to the bison file.
I am trying to create a lex scanner that reads through a header file and then finds lexical errors and writes them to a text output but I keep running into a undefined error on the lines that contain the tokens {line} and {punc}. I'm completely new to lex so I have not been able to identify quite what is missing. Here is part of my code so far that has the errors:
%{
#include <stdio.h>
#include <ctype.h>
#include "tokens.h"
%}
%{ option noyywrap
%}
ws [ \t\r]+
%%
[ \t\n] ;
. printf("Unexpected Character\n");
: return COLON;
{ws} { ECHO; }
{line} { ECHO; Listing::nextLine();}
"<" { ECHO; return(RELOP); }
begin { ECHO; return(BEGIN_); }
{punc} { ECHO; return yytext[0]; }
. { ECHO; Listing::appendError(LEXICAL, yytext); }
%%
int main()
{
yylex();
}
You have to define line and punc in the initial section, just as you did with ws. The error is telling you that it couldn't expand {line} and {punc} because there was no definition for those two identifiers.
I have a problem in compiling my code.
It works when main() is in the same file as yacc parser but its not working when I put main() in another file.
This is my Flex file: (lex1.ll)
%{
#include <stdio.h>
#include "yac1.tab.hh"
extern "C"
{
int yylex(void);
}
int line_num = 1;
%}
alpha [A-Za-z]
digit [0-9]
%%
"DELETE ALL" return DELALL;
[ \t] ;
INSERT return INSERT;
DELETE return DELETE;
FIND return FIND;
[0-9]+\.[0-9]+ { yylval.fval = atof(yytext); return FLOAT; }
[0-9]+ { yylval.ival = atoi(yytext); return INT; }
[a-zA-Z0-9_]+ { yylval.sval = strdup(yytext);return STRING; }
\n { ++line_num; return ENDL; }
. ;
%%
This is my Bison file: (yac1.yy)
%{
#include <stdio.h>
#include <stdlib.h>
int intval;
void yyerror(const char *s)
{
fprintf(stderr, "error: %s\n", s);
}
extern "C"
{
int yyparse(void);
int yylex(void);
int yywrap()
{
return 1;
}
}
%}
%token INSERT DELETE DELALL FIND ENDL
%union {
int ival;
float fval;
char *sval;
}
%token <ival> INT
%token <fval> FLOAT
%token <sval> STRING
%%
S:T {printf("INPUT ACCEPTED....\n");exit(0);};
T: INSERT val {printf("hey insert FOUND\n");}
| DELETE val
| DELALL ENDL
| FIND val
;
val : INT ENDL {printf("hey %d\n",$1);intval=$1; }
|
FLOAT ENDL
|
STRING ENDL {printf("hey %s\n",$1);}
;
%%
/*
It works if I uncomment this block of code
int main()
{
while(1){
printf("Enter the string");
yyparse();
}
}
*/
This is my main program: (testlex.cc)
#include <stdio.h>
#include "lexheader.h"
#include "yac1.tab.hh"
extern int intval;
main()
{
/*
char * line = "INSERT 54\n";
YY_BUFFER_STATE bp = yy_scan_string( line );
yy_switch_to_buffer(bp);
yyparse();
yy_delete_buffer(bp);
printf ("hello %d",intval);
*/
printf("Enter the query:");
//while(1)
printf ("%d\n",yyparse());
}
And this is my Makefile
parser: lex1.ll yac1.yy testlex.cc
bison -d yac1.yy
flex --header-file="lexheader.h" lex1.ll
g++ -o parser yac1.tab.cc lex.yy.c testlex.cc -lfl
clean:
rm -rf *.o parser
When I compile I get this error.
bison -d yac1.yy
flex --header-file="lexheader.h" lex1.ll
g++ -o parser yac1.tab.cc lex.yy.c testlex.cc -lfl
testlex.cc: In function ‘int main()’:
testlex.cc:21:24: error: ‘yyparse’ was not declared in this scope
make: *** [parser] Error 1
PS: It is necessary for me to compile with g++.With gcc I have a working code.
Any help in this regard is highly appreciated.
Thanks.
Did you read the documentation of GNU Bison ? It has a chapter about C++ parsers with a complete example (quite similar to yours).
You could explicitly declare yyparse as suggested by this answer, but making a real C++ parser is perhaps better.
Your Makefile is not very good. You could have something like
LEX= flex
YACC= bison
LIBES= -lfl
CXXFLAGS= -Wall
parser: lex1.o yac1.tab.o lex.yy.o testlex.o
$(LINKER.cc) -o $# $^ $(LIBES)
lex1.cc: lex1.ll
$(LEX) --header-file="lexheader.h" $< -o $#
yac1.tag.cc: yac1.yy
$(YACC) -d $<
Take also time to read the documentation of GNU make. You might want to use remake as remake -x to debug your Makefile.
You need to declare the yyparse function in the testlex.cc file:
int yyparse();
This is what is known as a function prototype, and tells the compiler that the function exists and can be called.
After looking a little closer at your source I now know the reason why the existing prototype didn't work: It's because you declared it as extern "C" but compiled the file as a C++ source. The extern "C" told the compiler that the yyparse function was an old C style function but then you continued to compile the source with a C++ compiler. This caused a name mismatch.
i would like to use the code generated by lex in another code that i have , but all the examples that i have seen is embedding the main function inside the lex file not the opposite.
is it possible to use(include) the c generated file from lex into other code that to have something like this (not necessarily the same) ?
#include<something>
int main(){
Lexer l = Lexer("some string or input file");
while (l.has_next()){
Token * token = l.get_next_token();
//somecode
}
//where token is just a simple object to hold the token type and lexeme
return 0;
}
This is what I would start with:
Note: this is an example of using a C interface
To use the C++ interface add %option c++ See below
Test.lex
IdentPart1 [A-Za-z_]
Identifier {IdentPart1}[A-Za-z_0-9]*
WHITESPACE [ \t\r\n]
%option noyywrap
%%
{Identifier} {return 257;}
{WHITESPACE} {/* Ignore */}
. {return 258;}
%%
// This is the bit you want.
// It is best just to put this at the bottom of the lex file
// By default functions are extern. So you can create a header file with
// these as extern then included that header file in your code (See Lexer.h)
void* setUpBuffer(char const* text)
{
YY_BUFFER_STATE buffer = yy_scan_string(text);
yy_switch_to_buffer(buffer);
return buffer;
}
void tearDownBuffer(void* buffer)
{
yy_delete_buffer((YY_BUFFER_STATE)buffer);
}
Lexer.h
#ifndef LOKI_A_LEXER_H
#define LOKI_A_LEXER_H
#include <string>
extern int yylex();
extern char* yytext;
extern int yyleng;
// Here is the interface to the lexer you set up above
extern void* setUpBuffer(char const* text);
extern void tearDownBuffer(void* buffer);
class Lexer
{
std::string token;
std::string text;
void* buffer;
public:
Lexer(std::string const& t)
: text(t)
{
// Use the interface to set up the buffer
buffer = setUpBuffer(text.c_str());
}
~Lexer()
{
// Tear down your interface
tearDownBuffer(buffer);
}
// Don't use RAW pointers
// This is only a quick and dirty example.
bool nextToken()
{
int val = yylex();
if (val != 0)
{
token = std::string(yytext, yyleng);
}
return val;
}
std::string const& theToken() const {return token;}
};
#endif
main.cpp
#include "Lexer.h"
#include <iostream>
int main()
{
Lexer l("some string or input file");
// Did not like your hasToken() interface.
// Just call nextToken() until it fails.
while (l.nextToken())
{
std::cout << l.theToken() << "\n";
delete token;
}
//where token is just a simple object to hold the token type and lexeme
return 0;
}
Build
> flext test.lex
> g++ main.cpp lex.yy.c
> ./a.out
some
string
or
input
file
>
Alternatively you can use the C++ interface to flex (its experimental)
test.lext
%option c++
IdentPart1 [A-Za-z_]
Identifier {IdentPart1}[A-Za-z_0-9]*
WHITESPACE [ \t\r\n]
%%
{Identifier} {return 257;}
{WHITESPACE} {/* Ignore */}
. {return 258;}
%%
// Note this needs to be here
// If you define no yywrap() in the options it gets added to the header file
// which leads to multiple definitions if you are not careful.
int yyFlexLexer::yywrap() { return 1;}
main.cpp
#include "MyLexer.h"
#include <iostream>
#include <sstream>
int main()
{
std::istringstream data("some string or input file");
yyFlexLexer l(&data, &std::cout);
while (l.yylex())
{
std::cout << std::string(l.YYText(), l.YYLeng()) << "\n";
}
//where token is just a simple object to hold the token type and lexeme
return 0;
}
build
> flex --header-file=MyLexer.h test.lex
> g++ main.cpp lex.yy.cc
> ./a.out
some
string
or
input
file
>
Sure. I'm not sure about the generated class; we use the C generated
parsers, and call them from C++. Or you can insert any sort of wrapper
code you want in the lex file, and call anything there from outside of
the generated file.
The keywords are %option reentrant or %option c++.
As an example here's the ncr2a scanner:
/** ncr2a_lex.l: Replace all NCRs by corresponding printable ASCII characters. */
%%
&#(1([01][0-9]|2[0-6])|3[2-9]|[4-9][0-9]); { /* accept 32..126 */
/** `+2` skips '&#', `atoi()` ignores ';' at the end */
fputc(atoi(yytext + 2), yyout); /* non-recursive version */
}
The scanner code can be left unchanged.
Here the program that uses it:
/** ncr2a.c */
#include "ncr2a_lex.h"
typedef struct {
int i,j; /** put here whatever you need to keep extra state */
} State;
int main () {
yyscan_t scanner;
State my_custom_data = {0,0};
yylex_init(&scanner);
yyset_extra(&my_custom_data, scanner);
yylex(scanner);
yylex_destroy(scanner);
return 0;
}
To build ncr2a executable:
flex -R -oncr2a_lex.c --header-file=ncr2a_lex.h ncr2a_lex.l
cc -c -o ncr2a_lex.o ncr2a_lex.c
cc -o ncr2a ncr2a_lex.o ncr2a.c -lfl
Example
$ echo 'three colons :::' | ./ncr2a
three colons :::
This example uses stdin/stdout as input/output and it calls yylex() once.
To read from a file:
yyin = fopen("input.txt", "r" );
#Loki Astari's answer shows how to read from a string (buffer = yy_scan_string(text, scanner); yy_switch_to_buffer(buffer, scanner))
.
To call yylex() once for each token add return inside rule definitions that yield full token in the *.l file.