flex Regular expression question about bad character - regex

I am new to the flex regular expression and I don't why it keep showing the bad character. I am wondering if the "?!" is exist in flex regular expression. If not, how to replace it with correct one?
Here is my code
%option noyywrap
%{
#include <assert.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
%}
/* ------------------- Rules space -------------------- */
%%
^aa(a|b)*$ {printf("Token1:%s\n" ,yytext);}
a{3}(a|b)*$ {printf("Token2:%s\n" ,yytext);}
a{2}(a|b)*$ {printf("Token3:%s\n" ,yytext);}
^(?:(a|b){2})+$ {printf("Token4:%s\n" ,yytext);}
^b*(ab*ab*)*$ {printf("Token5:%s\n" ,yytext);}
^(?:.{5})*$ {printf("Token6:%s\n" ,yytext);}
\b((?!aaa)\w)+\b {printf("Token7:%s\n" ,yytext);} // line 19 error appear here
.|\n {} // default rule (always include to match all other strings)
%%
/* ----------------- User code space ------------------ */
main()
{
yylex();
return;
}
Error

Related

Flex/Bison Markdown to HTML Program

This is for a homework assignment. The only code I've edited myself are the definitions, rules, and tokens. What I have so far compiles successfully but gives me a segmentation fault when I try to run it on the markdown file (.md), and the HTML output is just a blank file because of that.
%{
#define YYSTYPE char *
#include <string.h>
#include "miniMD2html.tab.h"
extern YYSTYPE yylval;
%}
%option yylineno
/* Flex definitions */
whitespace [ \t]+
newline [\n]+|{whitespace}[\n]+
textword [a-zA-Z:/.\-,\']+
integer [0-9]+
header #|##|###|####|#####
%%
{header} { return T_HEADER; }
{integer} { return T_INTEGER; }
{textword} { return T_TEXTWORD; }
{whitespace} { return T_BLANK; }
{newline} { return T_NEWLINE; }
%%
The generate functions are given in another file. Most of them just accept char*, the generate_header function takes an int and char*, and the generate_image function takes two char* and two int. The grammar may look weird but this is what was given in the assignment.
%{
#include "global.h"
#include "stdlib.h"
#include "stdio.h"
#define YYSTYPE char *
extern int yylex();
int yywrap();
int yyerror(const char*);
int yyparse();
extern FILE *yyin;
Html_Doc *html_doc;
%}
/* Define tokens here */
%token T_BLANK T_NEWLINE
%token T_HEADER T_INTEGER T_TEXTWORD
%% /* Grammar rules and actions follow */
s: mddoc;
mddoc: /*empty*/ | mddoc paragraph;
paragraph: T_NEWLINE {add_linebreak(html_doc);}
| pcontent T_NEWLINE {add_element(html_doc, $1); free($1);} ;
pcontent: header
| rftext {generate_paragraph($1);}
header: T_HEADER T_BLANK rftext {generate_header(strlen($1), $3);}
rftext: rftext T_BLANK rftextword {strappend($1, $3);}
| rftext rftextword {strappend($1, $2);}
| rftextword
rftextword: textnum | image | format
image: "![" text "](" text '=' T_INTEGER '#' T_INTEGER ')' {generate_image($2, $4, atoi($6), atoi($8));}
format: "**" text "**" {generate_bold($2);}
| '_' text '_' {generate_italic($2);}
| "**" format "**" {generate_bold($2);}
| '_' format '_' {generate_italic($2);}
text: text T_BLANK textnum {strappend($1, $3);}
| text textnum {strappend($1, $2);}
| textnum
textnum: T_TEXTWORD | T_INTEGER
%%
int main(int argc, char *argv[]) {
// yydebug = 1;
FILE *fconfig = fopen(argv[1], "r");
// make sure it is valid
if (!fconfig) {
printf("Error reading file!\n");
return -1;
}
html_doc = new_html_doc();
// set lex to read from file
yyin = fconfig;
int ret = yyparse();
output_result(html_doc);
del_html_doc(html_doc);
return ret;
}
int yywrap(){
return 1;
}
int yyerror(const char* s){
extern int yylineno;
extern char *yytext;
printf("error while parsing line %d: %s at '%s', ASCII code: %d\n", yylineno, s, yytext, (int)(*yytext));
return 1;
}
None of your flex rules ever set the value of yylval, so it will be NULL throughout. And so will all the references to semantic values ($n) in your grammar. Since most functions which take a char* assume that it is a valid string, it's pretty likely that one of them will soon try to examine the string value, and the fact that the pointer is NULL will certainly lead to a segfault.
In addition, there are both single character and quoted string tokens in your grammar, none of which can be produced by your scanner. So it's quite likely that the parser will stop with a syntax error as soon as one of the non-word characters is encountered in the input.
In the bison file, every token should be separated by ;
s: mddoc;
mddoc: /*empty*/ | mddoc paragraph;
paragraph: ...
Notice the
;
after mmdoc paragraph.
This is correct but the following tokens are not separated well.
Also, as #Rockcat as said, in the flex file, you should add
yylval = strdup(yytext);
before returning your token to the bison file.

Flex, Bison, C++ all in Xcode

I'm working through Problems with reentrant Flex and Bison. It compiles and runs just fine on my machine. What I want to do though is make use of C++ STL. Anytime I try to include a CPP header, it says it can't be found. There are only a handful of questions about this on Goog. Does anyone have a working example of this sort of setup, or a solution I might implement?
Any help would be greatly appreciated.
Thanks!
EDIT So for one reason or another, I have to add the include path of any headers in the build settings. Must be due to the custom makefile of this person's example. It's above my pay-grade. Anyway, I can now use STL libraries inside of main.
WHAT I REALLY WANT TO DO IS USE FLEX/BISON WITH CPP, AND IF I TRY TO INCLUDE STL HEADERS ANYWHERE BUT MAIN, I GET ERROR "HEADER NOT FOUND".
I can include C-headers just fine, though.
Here's answer from the author of another answer in the linked topic.
I have adapted that my example to work with C++.
The key points are:
I am using recent Flex / Bison: brew install flex and brew install bison. Not sure if the same will work with default OSX/Xcode's flex/bison.
Generated flex/bison files should have C++ extensions (lexer.[hpp|mm], parser.[hpp|mm]) for Xcode to pick up the C++ code.
There is a Xcode's Build Phase that runs Make.
All the relevant files follow below but I recommend you to check out the example project.
main.mm's code is
#include "parser.hpp"
#include "lexer.hpp"
extern YY_BUFFER_STATE yy_scan_string(const char * str);
extern void yy_delete_buffer(YY_BUFFER_STATE buffer);
ParserConsumer *parserConsumer = [ParserConsumer new];
char input[] = "RAINBOW UNICORN 1234 UNICORN";
YY_BUFFER_STATE state = yy_scan_string(input);
yyparse(parserConsumer);
yy_delete_buffer(state);
Lexer.lm:
%{
#include "ParserConsumer.h"
#include "parser.hpp"
#include <iostream>
#include <cstdio>
int yylex(void);
void yyerror(id <ParserConsumer> consumer, const char *msg);
%}
%option header-file = "./Parser/Generated Code/lexer.hpp"
%option outfile = "./Parser/Generated Code/lexer.mm"
%option noyywrap
NUMBER [0-9]+
STRING [A-Z]+
SPACE \x20
%%
{NUMBER} {
yylval.numericValue = (int)strtoul(yytext, NULL, 10);
std::cout << "Lexer says: Hello from C++\n";
printf("[Lexer, number] %s\n", yytext);
return Token_Number;
}
{STRING} {
yylval.stringValue = strdup(yytext);
printf("[Lexer, string] %s\n", yytext);
return Token_String;
}
{SPACE} {
// Do nothing
}
<<EOF>> {
printf("<<EOF>>\n");
return 0;
}
%%
void yyerror (id <ParserConsumer> consumer, const char *msg) {
printf("%s\n", msg);
abort();
}
Parser.ym:
%{
#include <iostream>
#include <cstdio>
#include "ParserConsumer.h"
#include "parser.hpp"
#include "lexer.hpp"
int yylex();
void yyerror(id <ParserConsumer> consumer, const char *msg);
%}
%output "Parser/Generated Code/parser.mm"
%defines "Parser/Generated Code/parser.hpp"
//%define api.pure full
%define parse.error verbose
%parse-param { id <ParserConsumer> consumer }
%union {
char *stringValue;
int numericValue;
}
%token <stringValue> Token_String
%token <numericValue> Token_Number
%%
/* http://www.tldp.org/HOWTO/Lex-YACC-HOWTO-6.html 6.2 Recursion: 'right is wrong' */
tokens: /* empty */
| tokens token
token:
Token_String {
std::cout << "Parser says: Hello from C++\n";
printf("[Parser, string] %s\n", $1);
[consumer parserDidParseString:$1];
free($1);
}
| Token_Number {
printf("[Parser, number]\n");
[consumer parserDidParseNumber:$1];
}
%%
Makefile:
generate-parser: clean flex bison
clean:
rm -rf './Parser/Generated Code'
mkdir -p './Parser/Generated Code'
flex:
# brew install flex
/usr/local/bin/flex ./Parser/Lexer.lm
bison:
# brew install bison
/usr/local/bin/bison -d ./Parser/Parser.ym

Undefined Definition in LEX program

I am trying to create a lex scanner that reads through a header file and then finds lexical errors and writes them to a text output but I keep running into a undefined error on the lines that contain the tokens {line} and {punc}. I'm completely new to lex so I have not been able to identify quite what is missing. Here is part of my code so far that has the errors:
%{
#include <stdio.h>
#include <ctype.h>
#include "tokens.h"
%}
%{ option noyywrap
%}
ws [ \t\r]+
%%
[ \t\n] ;
. printf("Unexpected Character\n");
: return COLON;
{ws} { ECHO; }
{line} { ECHO; Listing::nextLine();}
"<" { ECHO; return(RELOP); }
begin { ECHO; return(BEGIN_); }
{punc} { ECHO; return yytext[0]; }
. { ECHO; Listing::appendError(LEXICAL, yytext); }
%%
int main()
{
yylex();
}
You have to define line and punc in the initial section, just as you did with ws. The error is telling you that it couldn't expand {line} and {punc} because there was no definition for those two identifiers.

Include generated code by flex and bison

I'm working with Flex and Bison in C++. I am learning to use these tools and the best way to start is by performing a simple calculator. Once generated the application (the executable) from my calc.y and calc.l files, I can run the .exe file and use it, but now I want to include it in a file c ++ to use it in my application but I can't. I think it's my fault because I'm including bad the generated file or generating bad code to import.
main.cpp
#include <iostream>
extern "C" {
#include "y.tab.h"
}
int main ( int argc, char *argv[] ) {
yyparse();
printf(elementos);
return 0;
}
calc.l
%{
#include "y.tab.h"
#include <stdlib.h>
void yyerror(char *);
%}
%%
[0-9]+ {
yylval = atoi(yytext);
return INTEGER;
}
[-+()\n] {
return *yytext;
}
[ \t] ;
. {
yyerror("Invalid character.");
}
%%
int yywrap(void) {
return 1;
}
calc.y
%{
#include <stdio.h>
int yylex(void);
void yyerror(char *);
int sym[26];
int elementos = 0;
%}
%token INTEGER VARIABLE
%left '+' '-'
%left '*' '/'
%%
program:
program expr '\n' { printf("%d\n", $2 ); }
|
;
statement:
expr { printf("%d\n", $1); }
| VARIABLE '=' expr { sym[$1] = $3; }
;
expr:
INTEGER { $$ = $1; }
| expr '+' expr { $$ = $1 + $3; elementos = elementos + 1;}
| expr '-' expr { $$ = $1 - $3; }
| expr '*' expr { $$ = $1 * $3; }
| expr '/' expr { $$ = $1 / $3; }
| '(' expr ')' { $$ = $2; }
;
%%
void yyerror(char *s) {
fprintf(stderr, "%s\n", s);
}
int main(void) {
yyparse();
return 0;
}
The y.tab.h is generated by bison. When I'm trying to compile main.cpp I'm getting an error:
command: gcc main.cpp -o main.exe
result: main.cpp: In function 'int main(int, char**)':
main.cpp:8:10: error: 'yyparse' was not declared in this scope
main.cpp:9:9: error: 'elementos' was not declared in this scope
How can I fix it?
I'm using gcc version 4.7.2, bison 2.4.1 and 2.5.4 on windows 8.1.
Thanks!
EDIT:
The y.tab.h file is:
/* Tokens. */
#ifndef YYTOKENTYPE
# define YYTOKENTYPE
/* Put the tokens into the symbol table, so that GDB and other debuggers
know about them. */
enum yytokentype {
INTEGER = 258,
VARIABLE = 259
};
#endif
/* Tokens. */
#define INTEGER 258
#define VARIABLE 259
#if ! defined YYSTYPE && ! defined YYSTYPE_IS_DECLARED
typedef int YYSTYPE;
# define YYSTYPE_IS_TRIVIAL 1
# define yystype YYSTYPE /* obsolescent; will be withdrawn */
# define YYSTYPE_IS_DECLARED 1
#endif
extern YYSTYPE yylval;
There's not a "elementos" variable, but looking in the generated y.tab.c file, I found that there is defined!
You have a number of problems:
Bison and Flex generate C code, which you then need to compile and link with your program. Your question shows no indication that you have done this.
If you want to be able to use the elementos variable in your main.cpp file, then you need to declare it. It may be defined somewhere else, but the compiler doesn't know that when it compiles main.cpp. Add this line inside the extern "C" part: extern int elementos;
You have two different main functions.
In main.cpp, you #include iostream, but then use printf from stdio.
The call to printf is wrong. It needs a format string.
Bison shows several warnings, which you probably need to read and do something about if you want your program to work.

Using lex generated source code in another file

i would like to use the code generated by lex in another code that i have , but all the examples that i have seen is embedding the main function inside the lex file not the opposite.
is it possible to use(include) the c generated file from lex into other code that to have something like this (not necessarily the same) ?
#include<something>
int main(){
Lexer l = Lexer("some string or input file");
while (l.has_next()){
Token * token = l.get_next_token();
//somecode
}
//where token is just a simple object to hold the token type and lexeme
return 0;
}
This is what I would start with:
Note: this is an example of using a C interface
To use the C++ interface add %option c++ See below
Test.lex
IdentPart1 [A-Za-z_]
Identifier {IdentPart1}[A-Za-z_0-9]*
WHITESPACE [ \t\r\n]
%option noyywrap
%%
{Identifier} {return 257;}
{WHITESPACE} {/* Ignore */}
. {return 258;}
%%
// This is the bit you want.
// It is best just to put this at the bottom of the lex file
// By default functions are extern. So you can create a header file with
// these as extern then included that header file in your code (See Lexer.h)
void* setUpBuffer(char const* text)
{
YY_BUFFER_STATE buffer = yy_scan_string(text);
yy_switch_to_buffer(buffer);
return buffer;
}
void tearDownBuffer(void* buffer)
{
yy_delete_buffer((YY_BUFFER_STATE)buffer);
}
Lexer.h
#ifndef LOKI_A_LEXER_H
#define LOKI_A_LEXER_H
#include <string>
extern int yylex();
extern char* yytext;
extern int yyleng;
// Here is the interface to the lexer you set up above
extern void* setUpBuffer(char const* text);
extern void tearDownBuffer(void* buffer);
class Lexer
{
std::string token;
std::string text;
void* buffer;
public:
Lexer(std::string const& t)
: text(t)
{
// Use the interface to set up the buffer
buffer = setUpBuffer(text.c_str());
}
~Lexer()
{
// Tear down your interface
tearDownBuffer(buffer);
}
// Don't use RAW pointers
// This is only a quick and dirty example.
bool nextToken()
{
int val = yylex();
if (val != 0)
{
token = std::string(yytext, yyleng);
}
return val;
}
std::string const& theToken() const {return token;}
};
#endif
main.cpp
#include "Lexer.h"
#include <iostream>
int main()
{
Lexer l("some string or input file");
// Did not like your hasToken() interface.
// Just call nextToken() until it fails.
while (l.nextToken())
{
std::cout << l.theToken() << "\n";
delete token;
}
//where token is just a simple object to hold the token type and lexeme
return 0;
}
Build
> flext test.lex
> g++ main.cpp lex.yy.c
> ./a.out
some
string
or
input
file
>
Alternatively you can use the C++ interface to flex (its experimental)
test.lext
%option c++
IdentPart1 [A-Za-z_]
Identifier {IdentPart1}[A-Za-z_0-9]*
WHITESPACE [ \t\r\n]
%%
{Identifier} {return 257;}
{WHITESPACE} {/* Ignore */}
. {return 258;}
%%
// Note this needs to be here
// If you define no yywrap() in the options it gets added to the header file
// which leads to multiple definitions if you are not careful.
int yyFlexLexer::yywrap() { return 1;}
main.cpp
#include "MyLexer.h"
#include <iostream>
#include <sstream>
int main()
{
std::istringstream data("some string or input file");
yyFlexLexer l(&data, &std::cout);
while (l.yylex())
{
std::cout << std::string(l.YYText(), l.YYLeng()) << "\n";
}
//where token is just a simple object to hold the token type and lexeme
return 0;
}
build
> flex --header-file=MyLexer.h test.lex
> g++ main.cpp lex.yy.cc
> ./a.out
some
string
or
input
file
>
Sure. I'm not sure about the generated class; we use the C generated
parsers, and call them from C++. Or you can insert any sort of wrapper
code you want in the lex file, and call anything there from outside of
the generated file.
The keywords are %option reentrant or %option c++.
As an example here's the ncr2a scanner:
/** ncr2a_lex.l: Replace all NCRs by corresponding printable ASCII characters. */
%%
&#(1([01][0-9]|2[0-6])|3[2-9]|[4-9][0-9]); { /* accept 32..126 */
/** `+2` skips '&#', `atoi()` ignores ';' at the end */
fputc(atoi(yytext + 2), yyout); /* non-recursive version */
}
The scanner code can be left unchanged.
Here the program that uses it:
/** ncr2a.c */
#include "ncr2a_lex.h"
typedef struct {
int i,j; /** put here whatever you need to keep extra state */
} State;
int main () {
yyscan_t scanner;
State my_custom_data = {0,0};
yylex_init(&scanner);
yyset_extra(&my_custom_data, scanner);
yylex(scanner);
yylex_destroy(scanner);
return 0;
}
To build ncr2a executable:
flex -R -oncr2a_lex.c --header-file=ncr2a_lex.h ncr2a_lex.l
cc -c -o ncr2a_lex.o ncr2a_lex.c
cc -o ncr2a ncr2a_lex.o ncr2a.c -lfl
Example
$ echo 'three colons :::' | ./ncr2a
three colons :::
This example uses stdin/stdout as input/output and it calls yylex() once.
To read from a file:
yyin = fopen("input.txt", "r" );
#Loki Astari's answer shows how to read from a string (buffer = yy_scan_string(text, scanner); yy_switch_to_buffer(buffer, scanner))
.
To call yylex() once for each token add return inside rule definitions that yield full token in the *.l file.