Using lex generated source code in another file - c++

i would like to use the code generated by lex in another code that i have , but all the examples that i have seen is embedding the main function inside the lex file not the opposite.
is it possible to use(include) the c generated file from lex into other code that to have something like this (not necessarily the same) ?
#include<something>
int main(){
Lexer l = Lexer("some string or input file");
while (l.has_next()){
Token * token = l.get_next_token();
//somecode
}
//where token is just a simple object to hold the token type and lexeme
return 0;
}

This is what I would start with:
Note: this is an example of using a C interface
To use the C++ interface add %option c++ See below
Test.lex
IdentPart1 [A-Za-z_]
Identifier {IdentPart1}[A-Za-z_0-9]*
WHITESPACE [ \t\r\n]
%option noyywrap
%%
{Identifier} {return 257;}
{WHITESPACE} {/* Ignore */}
. {return 258;}
%%
// This is the bit you want.
// It is best just to put this at the bottom of the lex file
// By default functions are extern. So you can create a header file with
// these as extern then included that header file in your code (See Lexer.h)
void* setUpBuffer(char const* text)
{
YY_BUFFER_STATE buffer = yy_scan_string(text);
yy_switch_to_buffer(buffer);
return buffer;
}
void tearDownBuffer(void* buffer)
{
yy_delete_buffer((YY_BUFFER_STATE)buffer);
}
Lexer.h
#ifndef LOKI_A_LEXER_H
#define LOKI_A_LEXER_H
#include <string>
extern int yylex();
extern char* yytext;
extern int yyleng;
// Here is the interface to the lexer you set up above
extern void* setUpBuffer(char const* text);
extern void tearDownBuffer(void* buffer);
class Lexer
{
std::string token;
std::string text;
void* buffer;
public:
Lexer(std::string const& t)
: text(t)
{
// Use the interface to set up the buffer
buffer = setUpBuffer(text.c_str());
}
~Lexer()
{
// Tear down your interface
tearDownBuffer(buffer);
}
// Don't use RAW pointers
// This is only a quick and dirty example.
bool nextToken()
{
int val = yylex();
if (val != 0)
{
token = std::string(yytext, yyleng);
}
return val;
}
std::string const& theToken() const {return token;}
};
#endif
main.cpp
#include "Lexer.h"
#include <iostream>
int main()
{
Lexer l("some string or input file");
// Did not like your hasToken() interface.
// Just call nextToken() until it fails.
while (l.nextToken())
{
std::cout << l.theToken() << "\n";
delete token;
}
//where token is just a simple object to hold the token type and lexeme
return 0;
}
Build
> flext test.lex
> g++ main.cpp lex.yy.c
> ./a.out
some
string
or
input
file
>
Alternatively you can use the C++ interface to flex (its experimental)
test.lext
%option c++
IdentPart1 [A-Za-z_]
Identifier {IdentPart1}[A-Za-z_0-9]*
WHITESPACE [ \t\r\n]
%%
{Identifier} {return 257;}
{WHITESPACE} {/* Ignore */}
. {return 258;}
%%
// Note this needs to be here
// If you define no yywrap() in the options it gets added to the header file
// which leads to multiple definitions if you are not careful.
int yyFlexLexer::yywrap() { return 1;}
main.cpp
#include "MyLexer.h"
#include <iostream>
#include <sstream>
int main()
{
std::istringstream data("some string or input file");
yyFlexLexer l(&data, &std::cout);
while (l.yylex())
{
std::cout << std::string(l.YYText(), l.YYLeng()) << "\n";
}
//where token is just a simple object to hold the token type and lexeme
return 0;
}
build
> flex --header-file=MyLexer.h test.lex
> g++ main.cpp lex.yy.cc
> ./a.out
some
string
or
input
file
>

Sure. I'm not sure about the generated class; we use the C generated
parsers, and call them from C++. Or you can insert any sort of wrapper
code you want in the lex file, and call anything there from outside of
the generated file.

The keywords are %option reentrant or %option c++.
As an example here's the ncr2a scanner:
/** ncr2a_lex.l: Replace all NCRs by corresponding printable ASCII characters. */
%%
&#(1([01][0-9]|2[0-6])|3[2-9]|[4-9][0-9]); { /* accept 32..126 */
/** `+2` skips '&#', `atoi()` ignores ';' at the end */
fputc(atoi(yytext + 2), yyout); /* non-recursive version */
}
The scanner code can be left unchanged.
Here the program that uses it:
/** ncr2a.c */
#include "ncr2a_lex.h"
typedef struct {
int i,j; /** put here whatever you need to keep extra state */
} State;
int main () {
yyscan_t scanner;
State my_custom_data = {0,0};
yylex_init(&scanner);
yyset_extra(&my_custom_data, scanner);
yylex(scanner);
yylex_destroy(scanner);
return 0;
}
To build ncr2a executable:
flex -R -oncr2a_lex.c --header-file=ncr2a_lex.h ncr2a_lex.l
cc -c -o ncr2a_lex.o ncr2a_lex.c
cc -o ncr2a ncr2a_lex.o ncr2a.c -lfl
Example
$ echo 'three colons :::' | ./ncr2a
three colons :::
This example uses stdin/stdout as input/output and it calls yylex() once.
To read from a file:
yyin = fopen("input.txt", "r" );
#Loki Astari's answer shows how to read from a string (buffer = yy_scan_string(text, scanner); yy_switch_to_buffer(buffer, scanner))
.
To call yylex() once for each token add return inside rule definitions that yield full token in the *.l file.

Related

Flex/Bison Markdown to HTML Program

This is for a homework assignment. The only code I've edited myself are the definitions, rules, and tokens. What I have so far compiles successfully but gives me a segmentation fault when I try to run it on the markdown file (.md), and the HTML output is just a blank file because of that.
%{
#define YYSTYPE char *
#include <string.h>
#include "miniMD2html.tab.h"
extern YYSTYPE yylval;
%}
%option yylineno
/* Flex definitions */
whitespace [ \t]+
newline [\n]+|{whitespace}[\n]+
textword [a-zA-Z:/.\-,\']+
integer [0-9]+
header #|##|###|####|#####
%%
{header} { return T_HEADER; }
{integer} { return T_INTEGER; }
{textword} { return T_TEXTWORD; }
{whitespace} { return T_BLANK; }
{newline} { return T_NEWLINE; }
%%
The generate functions are given in another file. Most of them just accept char*, the generate_header function takes an int and char*, and the generate_image function takes two char* and two int. The grammar may look weird but this is what was given in the assignment.
%{
#include "global.h"
#include "stdlib.h"
#include "stdio.h"
#define YYSTYPE char *
extern int yylex();
int yywrap();
int yyerror(const char*);
int yyparse();
extern FILE *yyin;
Html_Doc *html_doc;
%}
/* Define tokens here */
%token T_BLANK T_NEWLINE
%token T_HEADER T_INTEGER T_TEXTWORD
%% /* Grammar rules and actions follow */
s: mddoc;
mddoc: /*empty*/ | mddoc paragraph;
paragraph: T_NEWLINE {add_linebreak(html_doc);}
| pcontent T_NEWLINE {add_element(html_doc, $1); free($1);} ;
pcontent: header
| rftext {generate_paragraph($1);}
header: T_HEADER T_BLANK rftext {generate_header(strlen($1), $3);}
rftext: rftext T_BLANK rftextword {strappend($1, $3);}
| rftext rftextword {strappend($1, $2);}
| rftextword
rftextword: textnum | image | format
image: "![" text "](" text '=' T_INTEGER '#' T_INTEGER ')' {generate_image($2, $4, atoi($6), atoi($8));}
format: "**" text "**" {generate_bold($2);}
| '_' text '_' {generate_italic($2);}
| "**" format "**" {generate_bold($2);}
| '_' format '_' {generate_italic($2);}
text: text T_BLANK textnum {strappend($1, $3);}
| text textnum {strappend($1, $2);}
| textnum
textnum: T_TEXTWORD | T_INTEGER
%%
int main(int argc, char *argv[]) {
// yydebug = 1;
FILE *fconfig = fopen(argv[1], "r");
// make sure it is valid
if (!fconfig) {
printf("Error reading file!\n");
return -1;
}
html_doc = new_html_doc();
// set lex to read from file
yyin = fconfig;
int ret = yyparse();
output_result(html_doc);
del_html_doc(html_doc);
return ret;
}
int yywrap(){
return 1;
}
int yyerror(const char* s){
extern int yylineno;
extern char *yytext;
printf("error while parsing line %d: %s at '%s', ASCII code: %d\n", yylineno, s, yytext, (int)(*yytext));
return 1;
}
None of your flex rules ever set the value of yylval, so it will be NULL throughout. And so will all the references to semantic values ($n) in your grammar. Since most functions which take a char* assume that it is a valid string, it's pretty likely that one of them will soon try to examine the string value, and the fact that the pointer is NULL will certainly lead to a segfault.
In addition, there are both single character and quoted string tokens in your grammar, none of which can be produced by your scanner. So it's quite likely that the parser will stop with a syntax error as soon as one of the non-word characters is encountered in the input.
In the bison file, every token should be separated by ;
s: mddoc;
mddoc: /*empty*/ | mddoc paragraph;
paragraph: ...
Notice the
;
after mmdoc paragraph.
This is correct but the following tokens are not separated well.
Also, as #Rockcat as said, in the flex file, you should add
yylval = strdup(yytext);
before returning your token to the bison file.

Flex, Bison, C++ all in Xcode

I'm working through Problems with reentrant Flex and Bison. It compiles and runs just fine on my machine. What I want to do though is make use of C++ STL. Anytime I try to include a CPP header, it says it can't be found. There are only a handful of questions about this on Goog. Does anyone have a working example of this sort of setup, or a solution I might implement?
Any help would be greatly appreciated.
Thanks!
EDIT So for one reason or another, I have to add the include path of any headers in the build settings. Must be due to the custom makefile of this person's example. It's above my pay-grade. Anyway, I can now use STL libraries inside of main.
WHAT I REALLY WANT TO DO IS USE FLEX/BISON WITH CPP, AND IF I TRY TO INCLUDE STL HEADERS ANYWHERE BUT MAIN, I GET ERROR "HEADER NOT FOUND".
I can include C-headers just fine, though.
Here's answer from the author of another answer in the linked topic.
I have adapted that my example to work with C++.
The key points are:
I am using recent Flex / Bison: brew install flex and brew install bison. Not sure if the same will work with default OSX/Xcode's flex/bison.
Generated flex/bison files should have C++ extensions (lexer.[hpp|mm], parser.[hpp|mm]) for Xcode to pick up the C++ code.
There is a Xcode's Build Phase that runs Make.
All the relevant files follow below but I recommend you to check out the example project.
main.mm's code is
#include "parser.hpp"
#include "lexer.hpp"
extern YY_BUFFER_STATE yy_scan_string(const char * str);
extern void yy_delete_buffer(YY_BUFFER_STATE buffer);
ParserConsumer *parserConsumer = [ParserConsumer new];
char input[] = "RAINBOW UNICORN 1234 UNICORN";
YY_BUFFER_STATE state = yy_scan_string(input);
yyparse(parserConsumer);
yy_delete_buffer(state);
Lexer.lm:
%{
#include "ParserConsumer.h"
#include "parser.hpp"
#include <iostream>
#include <cstdio>
int yylex(void);
void yyerror(id <ParserConsumer> consumer, const char *msg);
%}
%option header-file = "./Parser/Generated Code/lexer.hpp"
%option outfile = "./Parser/Generated Code/lexer.mm"
%option noyywrap
NUMBER [0-9]+
STRING [A-Z]+
SPACE \x20
%%
{NUMBER} {
yylval.numericValue = (int)strtoul(yytext, NULL, 10);
std::cout << "Lexer says: Hello from C++\n";
printf("[Lexer, number] %s\n", yytext);
return Token_Number;
}
{STRING} {
yylval.stringValue = strdup(yytext);
printf("[Lexer, string] %s\n", yytext);
return Token_String;
}
{SPACE} {
// Do nothing
}
<<EOF>> {
printf("<<EOF>>\n");
return 0;
}
%%
void yyerror (id <ParserConsumer> consumer, const char *msg) {
printf("%s\n", msg);
abort();
}
Parser.ym:
%{
#include <iostream>
#include <cstdio>
#include "ParserConsumer.h"
#include "parser.hpp"
#include "lexer.hpp"
int yylex();
void yyerror(id <ParserConsumer> consumer, const char *msg);
%}
%output "Parser/Generated Code/parser.mm"
%defines "Parser/Generated Code/parser.hpp"
//%define api.pure full
%define parse.error verbose
%parse-param { id <ParserConsumer> consumer }
%union {
char *stringValue;
int numericValue;
}
%token <stringValue> Token_String
%token <numericValue> Token_Number
%%
/* http://www.tldp.org/HOWTO/Lex-YACC-HOWTO-6.html 6.2 Recursion: 'right is wrong' */
tokens: /* empty */
| tokens token
token:
Token_String {
std::cout << "Parser says: Hello from C++\n";
printf("[Parser, string] %s\n", $1);
[consumer parserDidParseString:$1];
free($1);
}
| Token_Number {
printf("[Parser, number]\n");
[consumer parserDidParseNumber:$1];
}
%%
Makefile:
generate-parser: clean flex bison
clean:
rm -rf './Parser/Generated Code'
mkdir -p './Parser/Generated Code'
flex:
# brew install flex
/usr/local/bin/flex ./Parser/Lexer.lm
bison:
# brew install bison
/usr/local/bin/bison -d ./Parser/Parser.ym

How to get C/C++ module information with libclang

I am trying to use the module functionalities from libclang. Here is the context:
I have a clang module defined and a source file that call it:
module.modulemap
module test {
requires cplusplus
header "test.h"
}
test.h :
#pragma once
static inline int foo() { return 1; }
test.cpp :
// Try the following command:
// clang++ -fmodules -fcxx-modules -fmodules-cache-path=./cache_path -c test.cpp
// If you see stuff in the ./cache_path directory, then it works!
#include "test.h"
int main(int, char **) {
return foo();
}
cache_path is at first empty then after the command I can see stuff in it so this is working.
My problem is when I try to use libclang to parse the test.cpp file in order to get informations about module:
#include <stdio.h>
#include "clang-c/Index.h"
/*
compile with:
clang -lclang -o module_parser module_parser.c
*/
static enum CXChildVisitResult
visitor(CXCursor cursor, CXCursor parent, CXClientData data)
{
CXSourceLocation loc;
CXFile file;
CXString module_import;
CXModule module;
CXString module_name;
CXString module_full_name;
unsigned line;
unsigned column;
unsigned offset;
if (clang_getCursorKind(cursor) == CXCursor_ModuleImportDecl)
{
loc = clang_getCursorLocation(cursor);
clang_getSpellingLocation(loc,
&file,
&line,
&column,
&offset);
module_import = clang_getCursorSpelling(cursor);
printf("Module import dec at line: %d \"%s\"\n", line, clang_getCString(module_import));
clang_disposeString(module_import);
}
module = clang_Cursor_getModule(cursor);
module_name = clang_Module_getName(module);
module_full_name = clang_Module_getFullName(module);
printf("Module name %s , full name %s\n", clang_getCString(module_name),
clang_getCString(module_full_name));
clang_disposeString(module_name);
clang_disposeString(module_full_name);
return CXChildVisit_Recurse; // visit complete AST recursivly
}
int main(int argc, char *argv[]) {
CXIndex Index = clang_createIndex(0, 1);
const char *args[] = { "-x",
"c++",
"-fmodules",
"-fcxxmodules"//,
"-fmodules-cache-path",
"cache_path"
};
CXTranslationUnit TU = clang_createTranslationUnitFromSourceFile(Index,
"test.cpp",
6,
args,
0,
0);
clang_visitChildren(clang_getTranslationUnitCursor(TU), visitor, 0);
clang_disposeTranslationUnit(TU);
clang_disposeIndex(Index);
return 0;
}
The output of this code is :
...
Module name , full name
Module name , full name
Module name , full name
Module name , full name
Module name , full name
...
First it seems that clang doesn't detect any cursor of the kind CXCursor_ModuleImportDecl and then at any momment it find a valid module.
What am I doing wrong?

Include generated code by flex and bison

I'm working with Flex and Bison in C++. I am learning to use these tools and the best way to start is by performing a simple calculator. Once generated the application (the executable) from my calc.y and calc.l files, I can run the .exe file and use it, but now I want to include it in a file c ++ to use it in my application but I can't. I think it's my fault because I'm including bad the generated file or generating bad code to import.
main.cpp
#include <iostream>
extern "C" {
#include "y.tab.h"
}
int main ( int argc, char *argv[] ) {
yyparse();
printf(elementos);
return 0;
}
calc.l
%{
#include "y.tab.h"
#include <stdlib.h>
void yyerror(char *);
%}
%%
[0-9]+ {
yylval = atoi(yytext);
return INTEGER;
}
[-+()\n] {
return *yytext;
}
[ \t] ;
. {
yyerror("Invalid character.");
}
%%
int yywrap(void) {
return 1;
}
calc.y
%{
#include <stdio.h>
int yylex(void);
void yyerror(char *);
int sym[26];
int elementos = 0;
%}
%token INTEGER VARIABLE
%left '+' '-'
%left '*' '/'
%%
program:
program expr '\n' { printf("%d\n", $2 ); }
|
;
statement:
expr { printf("%d\n", $1); }
| VARIABLE '=' expr { sym[$1] = $3; }
;
expr:
INTEGER { $$ = $1; }
| expr '+' expr { $$ = $1 + $3; elementos = elementos + 1;}
| expr '-' expr { $$ = $1 - $3; }
| expr '*' expr { $$ = $1 * $3; }
| expr '/' expr { $$ = $1 / $3; }
| '(' expr ')' { $$ = $2; }
;
%%
void yyerror(char *s) {
fprintf(stderr, "%s\n", s);
}
int main(void) {
yyparse();
return 0;
}
The y.tab.h is generated by bison. When I'm trying to compile main.cpp I'm getting an error:
command: gcc main.cpp -o main.exe
result: main.cpp: In function 'int main(int, char**)':
main.cpp:8:10: error: 'yyparse' was not declared in this scope
main.cpp:9:9: error: 'elementos' was not declared in this scope
How can I fix it?
I'm using gcc version 4.7.2, bison 2.4.1 and 2.5.4 on windows 8.1.
Thanks!
EDIT:
The y.tab.h file is:
/* Tokens. */
#ifndef YYTOKENTYPE
# define YYTOKENTYPE
/* Put the tokens into the symbol table, so that GDB and other debuggers
know about them. */
enum yytokentype {
INTEGER = 258,
VARIABLE = 259
};
#endif
/* Tokens. */
#define INTEGER 258
#define VARIABLE 259
#if ! defined YYSTYPE && ! defined YYSTYPE_IS_DECLARED
typedef int YYSTYPE;
# define YYSTYPE_IS_TRIVIAL 1
# define yystype YYSTYPE /* obsolescent; will be withdrawn */
# define YYSTYPE_IS_DECLARED 1
#endif
extern YYSTYPE yylval;
There's not a "elementos" variable, but looking in the generated y.tab.c file, I found that there is defined!
You have a number of problems:
Bison and Flex generate C code, which you then need to compile and link with your program. Your question shows no indication that you have done this.
If you want to be able to use the elementos variable in your main.cpp file, then you need to declare it. It may be defined somewhere else, but the compiler doesn't know that when it compiles main.cpp. Add this line inside the extern "C" part: extern int elementos;
You have two different main functions.
In main.cpp, you #include iostream, but then use printf from stdio.
The call to printf is wrong. It needs a format string.
Bison shows several warnings, which you probably need to read and do something about if you want your program to work.

Compilation error (g++, bison, flex) when main is in another file

I have a problem in compiling my code.
It works when main() is in the same file as yacc parser but its not working when I put main() in another file.
This is my Flex file: (lex1.ll)
%{
#include <stdio.h>
#include "yac1.tab.hh"
extern "C"
{
int yylex(void);
}
int line_num = 1;
%}
alpha [A-Za-z]
digit [0-9]
%%
"DELETE ALL" return DELALL;
[ \t] ;
INSERT return INSERT;
DELETE return DELETE;
FIND return FIND;
[0-9]+\.[0-9]+ { yylval.fval = atof(yytext); return FLOAT; }
[0-9]+ { yylval.ival = atoi(yytext); return INT; }
[a-zA-Z0-9_]+ { yylval.sval = strdup(yytext);return STRING; }
\n { ++line_num; return ENDL; }
. ;
%%
This is my Bison file: (yac1.yy)
%{
#include <stdio.h>
#include <stdlib.h>
int intval;
void yyerror(const char *s)
{
fprintf(stderr, "error: %s\n", s);
}
extern "C"
{
int yyparse(void);
int yylex(void);
int yywrap()
{
return 1;
}
}
%}
%token INSERT DELETE DELALL FIND ENDL
%union {
int ival;
float fval;
char *sval;
}
%token <ival> INT
%token <fval> FLOAT
%token <sval> STRING
%%
S:T {printf("INPUT ACCEPTED....\n");exit(0);};
T: INSERT val {printf("hey insert FOUND\n");}
| DELETE val
| DELALL ENDL
| FIND val
;
val : INT ENDL {printf("hey %d\n",$1);intval=$1; }
|
FLOAT ENDL
|
STRING ENDL {printf("hey %s\n",$1);}
;
%%
/*
It works if I uncomment this block of code
int main()
{
while(1){
printf("Enter the string");
yyparse();
}
}
*/
This is my main program: (testlex.cc)
#include <stdio.h>
#include "lexheader.h"
#include "yac1.tab.hh"
extern int intval;
main()
{
/*
char * line = "INSERT 54\n";
YY_BUFFER_STATE bp = yy_scan_string( line );
yy_switch_to_buffer(bp);
yyparse();
yy_delete_buffer(bp);
printf ("hello %d",intval);
*/
printf("Enter the query:");
//while(1)
printf ("%d\n",yyparse());
}
And this is my Makefile
parser: lex1.ll yac1.yy testlex.cc
bison -d yac1.yy
flex --header-file="lexheader.h" lex1.ll
g++ -o parser yac1.tab.cc lex.yy.c testlex.cc -lfl
clean:
rm -rf *.o parser
When I compile I get this error.
bison -d yac1.yy
flex --header-file="lexheader.h" lex1.ll
g++ -o parser yac1.tab.cc lex.yy.c testlex.cc -lfl
testlex.cc: In function ‘int main()’:
testlex.cc:21:24: error: ‘yyparse’ was not declared in this scope
make: *** [parser] Error 1
PS: It is necessary for me to compile with g++.With gcc I have a working code.
Any help in this regard is highly appreciated.
Thanks.
Did you read the documentation of GNU Bison ? It has a chapter about C++ parsers with a complete example (quite similar to yours).
You could explicitly declare yyparse as suggested by this answer, but making a real C++ parser is perhaps better.
Your Makefile is not very good. You could have something like
LEX= flex
YACC= bison
LIBES= -lfl
CXXFLAGS= -Wall
parser: lex1.o yac1.tab.o lex.yy.o testlex.o
$(LINKER.cc) -o $# $^ $(LIBES)
lex1.cc: lex1.ll
$(LEX) --header-file="lexheader.h" $< -o $#
yac1.tag.cc: yac1.yy
$(YACC) -d $<
Take also time to read the documentation of GNU make. You might want to use remake as remake -x to debug your Makefile.
You need to declare the yyparse function in the testlex.cc file:
int yyparse();
This is what is known as a function prototype, and tells the compiler that the function exists and can be called.
After looking a little closer at your source I now know the reason why the existing prototype didn't work: It's because you declared it as extern "C" but compiled the file as a C++ source. The extern "C" told the compiler that the yyparse function was an old C style function but then you continued to compile the source with a C++ compiler. This caused a name mismatch.