syntax error, unexpected token, expecting end of file - c++

I get the following error when i run my Parser file ( binary got after compiling Flex/Bison files).
error: syntax error, unexpected TKN_PRIMARY, expecting end of file
Here is rule defined in flex code:
<PRIMARY_MME_STATE>{number} {
lexVal = YYText();
std::cout<<"PRIMARY MME --> "<<lexVal<<std::endl;
yylval->strVal = new std::string(lexVal);
return token::TKN_PRIMARYMME;
}
And my understanding is that since value of TKN_PRIMARY is zero ( which is the value defined for END %token END 0 "end of file") Instead of returning TKN_PRIMARY , it is expecting token END to be returned. Please comment if my understanding is correct . And Also how to tackle this issue.

If TKN_PRIMARY and END have the same value (or, in general, if any two different tokens have the same value), then the bison parser is going to act in unpredictable ways.
Quoting the bison manual:
It is generally best, however, to let Bison choose the numeric codes for
all token types. Bison will automatically select codes that don't
conflict with each other or with normal characters.
I think that's definitely the best way of dealing with the problem.

Related

DDMathParser implicit multiplication not working

I use DDMathParser to solver formula expressions using Swift. The following code works fine, however, implicit multiplication doesn't. Reading the docs it should work... So, what do I miss here?
my code:
...
substitutions.updateValue(3, forKey: "x")
let myString = "3$x"
do{
let expression = try Expression(string: myString, operatorSet: operatorSet, options: myTRO, locale: myLocale)
let result = try evaluator.evaluate(expression, substitutions: substitutions)
print("expression is: \(expression), the result is : \(result)")
} catch {
print("Error")
}
...
The code throws the "Error". Using the string "3*$x" the expression is calculated as expected.
DDMathParser author here.
So, the .invalidFormat error is thrown when the framework has a sequence of tokens and is looking for an operator in order to figure out what goes around it. If it can't find an operator but still has tokens to resolve but no operator, then it throws the .invalidFormat error.
This implies that you have a 3.0 number token and a $x variable token, but no × multiplication token.
I see also that you're passing in a custom set of TokenResolverOptions (the myTRO variable). I'd guess that you're passing in an option set that does not include the .allowImplicitMultiplication value. If I try to parse 3$x without the .allowImplicitMultiplication resolver option, then I get the .invalidFormat error thrown.
Ok, got it myself. As Dave DeLong mentioned .allowImplicitMultiplication is included by default in the options but will get ignored when creating custom options. Since I want to use localized expressions (decimal separator within expression string is local) I need to use the advanced definition of Expression:
let expression = try Expression(string: ..., operatorSet: ..., options: ..., locale: ...)
In order to use the localized string option I defined let myLocale = NSLocale.current but accidentally also created a new operatorSet new options and passed it to the expression definition. The right way is not to create custom operatorSet and options but to use the defaults within the Expression definition:
let expression = try Expression(string: expressionString, operatorSet: OperatorSet.default, options: TokenResolverOptions.default, locale: myLocale)
Dave DeLong did a really great job in creating the DDMatParser framework. For newbies it is very hard to get started with. The wiki section at DDMathParser is pretty basic and doesn't give some details or examples for all the other great functionality DDMatParser is providing.

Flex doesn't work for identifying errors in naming identifiers

I'm trying to make a lexical analyzer for a minilanguage. One of the rules refers to the fact that the identifiers are not allowed to start with a digit.
Here are the regular expressions that define a number and an identifier.
NUMBER [+-]?[0-9]+
ID [a-zA-Z][a-zA-Z0-9_]*
and the rules defined in the .lxi file:
%%
{DELIMITATOR} printf("Delimitator: %s\n",yytext);
{NUMBER} printf("Number: %s\n",yytext);
{ID} printf("Identifier: %s\n",yytext);
. printf("Error: %s\n",yytext);
%%
The problem appears when in the input file, there are tokens that do not respect the rules for naming identifiers. For instance, for
a := 1abc
I get the following result:
Number: 1;
Identifier: abc;
Instead, I would like to receive an error message. Is there something I can do? I also tried to use the trailing context when defining the numbers, but it doesn't seem to work.
Don't worry: The errors presented in your questions are going to be detected and processed / reported in the syntactic level (bison, yacc or similar).
Nevertheless if you think 1abc like errors are very common, you can
write a flex specific rule to report it. Example:
[0-9][a-zA-Z_]+ fprintf(stderr, "invalid identifier '%s'\n",yytext);
[a-zA-Z]{40,} fprintf(stderr, "identifier to long '%s'\n",yytext);

Storing the current line being analysed by flex

In my parser generated by flex, I would like to be able to store each line in the file, so that when reporting errors, I can show the user the line that the error occurred on.
I could of course do this using a vector and read in all lines from the file before/after lexing, but this would just add to the time needed to parse a file.
What I thought I could instead do, is to store the line whenever a new-line character is matched, and insert the current line into a vector. So my questions is, is there a variable/macro that flex that stores the current line inside? (Something like yyline perhaps)
Note: I am also using bison
By itself, lex/flex does not do what you ask. As noted, you want this for reporting error messages. (I do something like this in vi like emacs).
With lex/flex, the only way to store the entire line is to record each token from the current line into your own line-buffer. That can be complicated, especially if your lexer has to handle multi-line content (such as comments or strings).
The yytext variable only shows you the most recently parsed token (and yylength, the corresponding length). If your lexer does a simple ECHO, that is a token just like the ones you pay attention to.
Reading the file in advance as noted is one way to simplify the problem. In vi like emacs, the lexers read via a function from the in-memory buffer rather than from an input stream. It bypasses the normal stream-handling logic by redefining the YY_INPUT macro, e.g.,
#define YY_INPUT(buf,result,max_size) result = flt_input(buf,max_size)
Likewise, ECHO is redefined (since the editor reads the results back rather than letting them go to the standard output):
#define ECHO flt_echo(yytext, yyleng)
and it traps errors detected by the lexer with another redefinition:
#define YY_FATAL_ERROR(msg) flt_failed(msg);
However you do this, the yylineno value reported for a given token will be at the end of parsing a given token.
While it is nice to report the entire line in context in an error message, it is also useful to track the line and column number of each token -- various editors can deal with lines like this
filename:line:col:message
If you build up your line-buffer by tracking tokens, it might be relatively simple to track the column on which each token begins as well.

Log parsing rules in Jenkins

I'm using Jenkins log parser plugin to extract and display the build log.
The rule file looks like,
# Compiler Error
error /(?i) error:/
# Compiler Warning
warning /(?i) warning:/
Everything works fine but for some reasons, at the end of "Parsed Output Console", I see this message,
NOTE: Some bad parsing rules have been found:
Bad parsing rule: , Error:1
Bad parsing rule: , Error:1
This, I'm sure is a trivial issue but not able to figure it out at this moment.
Please help :)
EDIT:
Based on Kobi's answer and having looked into the "Parsing rules files", I fixed it this way (a single space after colon). This worked perfectly as expected.
# Compiler Error
error /(?i)error: /
# Compiler Warning
warning /(?i)warning: /
The Log Parser Plugin does not support spaces in your pattern.
This can be clearly seen in their source code:
final String ruleParts[] = parsingRule.split("\\s");
String regexp = ruleParts[1];
They should probably have used .split("\\s", 2).
As an alternative, you can use \s, \b, or an escape sequence - \u0020.
I had tried no spaces in the pattern, but that did not work.
Turns out that the Parsing Rules files does not support
empty lines in it. Once I removed the empty lines, I did not
get this "Bad parsing rule: , Error:1".
I think since the line is empty - it doesn't echo any rule after
the first colon. Would have been nice it the line number was
reported where the problem is.

How to write a bison file to automatically use a token enumeration list define in a C header file?

I am trying to build a parser with Bison/Yacc to be able to parse a flow of token done by another module. The tokens are already listed in a enumeration type as follow:
// C++ header file
enum token_id {
TokenType1 = 0x10000000,
TokenType2 = 0x11000000,
TokenType3 = 0x11100000,
//... and the list go on with about 200/300 line
};
I have gone through the documentation of bison many times but I couldn't find a better solution than copying each token in the Bison file like this:
/* Bison/Yacc file */
%token TokenType1 0x10000000
%token TokenType2 0x11000000
%token TokenType3 0x11100000
//...
If I have to do it like that, It will become pretty hard to maintain the file if the other module specification change (which happen quite oftenly).
Could you please tell me how to do it, or point me in the good direction (any idea/comment is welcome). It would greatly help me! Thanks in advance.
Instead of doing :
/* Bison/Yacc file */
%token TokenType1 0x10000000
%token TokenType2 0x11000000
%token TokenType3 0x11100000
//...
You just need to include the file with the token type in the declaration part
#include "mytoken_enum.h"
// ...
%token TokenType1
%token TokenType2
%token TokenType3
//...
EDIT: This can not be done:
As you see from the numbers above,
Bison just numbers the tokens
sequentially, and it is used shifted
in parser lookup tables as indices,
for speed simply. So Bison does not
support that, I feel sure, and it
would not be easy to fit with the
implementation model.
Just need to wrapper to convert the real token to yacc/bison token (eg: via yylex())
The obvious method would be a small utility to convert from one format to the other. If you're really making changes quite frequently, you might even consider storing the names and values in something like a SQL database, and write a couple of queries to produce the output in the correct format for each tool.
select token_name, '=' token_number ','
from token_table
select '%token ', token_name, ' ', token_number
from token_table
The first would require a bit of massaging, such as adding the 'enum token_id {" to the beginning, and "};" to the end, but you get the general idea. Of course, there are lots of alternatives -- XML, CSV, etc., but the general idea remains the same: store and edit as close to raw data as possible, and automate adding the extra "stuff" necessary to keep the tools happy.