Bison : Line number included in the error messages - c++

OK, so I suppose my question is quite self-explanatory.
I'm currently building a parser in Bison, and I want to make error reporting somewhat better.
Currently, I've set %define parse.error verbose (which actually gives messages like syntax error, unexpected ***********************, expecting ********************.
All I want is to add some more information in the error messages, e.g. line number (in input/file/etc)
My current yyerror (well nothing... unusual... lol) :
void yyerror(const char *str)
{
fprintf(stderr,"\x1B[35mInterpreter : \x1B[37m%s\n",str);
}
P.S.
I've gone through the latest Bison documentation, but I seem quite lost...
I've also had a look into the %locations directive, which most likely is very close to what I need - however, I still found no complete working example and I'm not sure how this is to be used.

So, here I'm a with a step-by-step solution :
We add the %locations directive in our grammar file (between %} and the first %%)
We make sure that our lexer file contains an include for our parser (e.g. #include "mygrammar.tab.h"), at the top
We add the %option yylineno option in our lexer file (between %} and the first %%)
And now, in our yyerror function (which will supposedly be in our lexer file), we may freely use this... yylineno (= current line in file being processed) :
void yyerror(const char *str)
{
fprintf(stderr,"Error | Line: %d\n%s\n",yylineno,str);
}
Yep. Simple as that! :-)

Whats worked for me was adding extern int yylineno in .ypp file:
/* parser.ypp */
%{
extern int yylineno;
%}
/* scanner.lex */
...
%option yylineno

Bison ships with a number of examples to demonstrate its features, see /usr/local/share/doc/bison/examples on your machine (where the prefix /usr/local depends on your configuration.
These examples in particular might be of interest to you:
lexcalc uses precedence directives and location tracking. It uses Flex to generate the scanner.
bistromathic demonstrates best practices when using Bison.
Its hand-written scanner tracks locations.
Its interface is pure.
It uses %params to pass user information to the parser and scanner.
Its scanner uses the error token to signal lexical errors and enter
error recovery.
Its interface is "incremental", well suited for interaction: it uses the
push-parser API to feed the parser with the incoming tokens.
It features an interactive command line with completion based on the
parser state, based on yyexpected_tokens.
It uses Bison's standard catalog for internationalization of generated
messages.
It uses a custom syntax error with location, lookahead correction and
token internationalization.
Error messages quote the source with squiggles that underline the error:
> 123 456
1.5-7: syntax error: expected end of file or + or - or * or / or ^ before number
1 | 123 456
| ^~~
It supports debug traces with semantic values.
It uses named references instead of the traditional $1, $2, etc.

Related

'Assert Failed' message incomplete using CppUnit and TFS2015

Using: MSTest / CppUnit / TFS2015 / VS2013 / C++
I'm debugging a test that runs fine locally and fails on the build machine (which I don't have access to). This morning I sat down and was presented with almost all of my tests passing -- except one. The test happens to be comparing two rather large strings and the (usually) very helpful Assert failed. Expected:<... never made it to the Actual:<... part because the string was too long. It's just a simple: Assert::AreEqual(expectedStr, actualStr);.
Right now my workaround is to write a file to a network path that I have access to from within the test (which is already an integration type test luckily -- but still...). Oh -- and did I mention that I have to run a build that will take 40 minutes even if I set Clean Workspace to None in my build process parameters to even get the test to run? That's a whole other question for another post =/.
Is there a way to look at the full results of a test assertion failure (without, for example, a string comparison being cut off)? A test run log file maybe?
According to your description, you want to express assertion failure messages in C++. Check this case may help you:
"
A common solution for this problem is to create an assert macro. For an example see this question. The final form of their macro in that answer was the following:
#define dbgassert(EX,...) \
(void)((EX) || (realdbgassert (#EX, __FILE__, __LINE__, ## __VA_ARGS__),0))
In your case, the realdbgassert would be a function that prints any relevant information to stderr or other output console, and then calls the assert function itself. Depending on how much information you want, you could also do a stack dump, or log any other relevant information that will help you identify the issue. However, it can be as simple as passing a printf-esque format string, and relevant parameter value(s).
Note that if you compiler doesn't support variadic macros, you can create macros that take a specific number of parameters instead. This is slightly more cumbersome, but an option if your compiler lacks the support, eg:
#define dbgassert0(EX) \ ...
#define dbgassert1(EX,p0) \ ...
#define dbgassert2(EX,p0,p1) \ ...
"

How to do proper error handling in BNFC? (C++, Flex, Bison)

I'm making a compiler in BNFC and it's got to a stage where it already compiles some stuff and the code works on my device. But before shipping it, I want my compiler to return proper error messages when the user tries to compile an invalid program.
I found how bison can write error on the stderr stream and I'm able to catch those. Now suppose the user's code has no syntax error, it just references an undefined variable, I'm able to catch this in my visitor, but I can't know what the line number was, how can I find the line number?
In bison you can access the starting and ending position of the current expression using the variable #$, which contains a struct with the members first_column, first_line, last_column and last_line. Similarly #1 etc. contain the same information for the sub-expressions $1 etc. respectively.
In order to have access to the same information later, you need to write it into your ast. So add a field to your AST node types to store the location and then set that field when creating the node in your bison file.
(previous answer is richer) but in some simple parsers if we declare
%option yylineno
in flex, and print it in yyerror,
yyerror(char *s) {
fprintf(stderr,"ERROR (line %d):before '%s'\n-%s",yylineno, yytext,s);
}
sometimes it help...

Convert C++ log source snippet to Windows Phone C++/CX

I just started developing for Windows Phone and I'm stuck with one piece of exisiting code I need to maintain. It's a macro from a logging lib that is used in many places of existing code.
This is the macro:
#define LOG_FORMAT_FUNCTION(fmtarg, firstvararg) __attribute__((__format__ (__printf__, fmtarg, firstvararg)))
And this is a method definition that fails to use the above macro with error "{ expected" (In German "Error: Es wurde ein '{' erwartet."):
void LogTrace_s(const char* category, const char* format, ...) LOG_FORMAT_FUNCTION(2, 3);
Can you help me get rid of the error? I'd also like to know what actually the macro does exactly.
Edit: After rading this here I now understand that this macro is good for error checking formatted strings. Now that I know, I need it even more. But I still have no clue how to translate this to MS C++.
Yes you CAN just omit it. Use
#if _MSVC_VER
#define LOG_FORMAT_FUNCTION(fmtarg, firstvararg)
#endif
It is annotating the function with extra information to help gcc give you better warnings. It does not change the behavior of the code in any way.

How to strip C++ style single line comments (`// ...`)

For a small DSL I'm writing I'm looking for a regex to match a comment string at the end of the like the // syntax of C++.
The simple case:
someVariable = 12345; // assignment
Is trivial to match but the problem starts when I have a string in the same line:
someFunctionCall("Hello // world"); // call with a string
The // in the string should not match as a comment
EDIT - The thing that compiles the DSL is not mine. It's a black box as far as I'm which I don't want to change and it doesn't support comments. I just want to add a thin wrapper to make it support comments.
EDIT
Since you are effectively preprocessing a sourcefile, why don't you use an existing preprocessor? If the language is sufficiently similar to C/C++ (especially regarding quoting and string literals), you will be able to just use cpp -P:
echo 'int main() { char* sz="Hello//world"; /*profit*/ } // comment' | cpp -P
Output: int main() { char* sz="Hello//world"; }
Other ideas:
Use a proper lexer/parser instead
Have a look at
CoCo/R (available for Java, C++, C#, etc.)
ANTLR (idem)
Boost Spirit (with Spirit Lex to make it even easier to strip the comments)
All suites come with sample grammars that parse C, C++ or a subset thereof
shoosh wrote:
EDIT - The thing that compiles the DSL is not mine. It's a black box as far as I'm which I don't want to change and it doesn't support comments. I just want to add a thin wrapper to make it support comments.
In that case, create a very simple lexer that matches one of three tokens:
// ... comments
string literals: " ... "
or, if none of the above matches, match any single character
Now, while you iterate ov er these 3 different type of tokens, simply print tokens (2) and (3) to the stdout (or to a file) to get the uncommented version of your source file.
A demo with GNU Flex:
example input file, in.txt:
someVariable = 12345; // assignment
// only a comment
someFunctionCall("Hello // world"); // call with a string
someOtherFunctionCall("Hello // \" world"); // call with a string and
// an escaped quote
The lexer grammar file, demo.l:
%%
"//"[^\r\n]* { /* skip comments */ }
"\""([^"]|[\\].)*"\"" {printf("%s", yytext);}
. {printf("%s", yytext);}
%%
int main(int argc, char **argv)
{
while(yylex() != 0);
return 0;
}
And to run the demo, do:
flex demo.l
cc lex.yy.c -lfl
./a.out < in.txt
which will print the following to the console:
someVariable = 12345;
someFunctionCall("Hello // world");
someOtherFunctionCall("Hello // \" world");
EDIT
I'm not really familiar with C/C++, and just saw #sehe's recommendation of using a pre-processor. That looks to be a far better option than creating your own (small) lexer. But I think I'll leave this answer since it shows how to handle this kind of stuff if no pre-processor is available (for whatever reason: perhaps cpp doesn't recognise certain parts of the DSL?).

lex (flex) generated program not parsing whole input

I have a relatively simple lex/flex file and have been running it with flex's debug flag to make sure it's tokenizing properly. Unfortunately, I'm always running into one of two problems - either the program the flex generates stops just gives up silently after a couple of tokens, or the rule I'm using to recognize characters and strings isn't called and the default rule is called instead.
Can someone point me in the right direction? I've attached my flex file and sample input / output.
Edit: I've found that the generated lexer stops after a specific rule: "cdr". This is more detailed, but also much more confusing. I've posted a shorted modified lex file.
/* lex file*/
%option noyywrap
%option nodefault
%{
enum tokens{
CDR,
CHARACTER,
SET
};
%}
%%
"cdr" { return CDR; }
"set" { return SET; }
[ \t\r\n] /*Nothing*/
[a-zA-Z0-9\\!##$%^&*()\-_+=~`:;"'?<>,\.] { return CHARACTER; }
%%
Sample input:
set c cdra + cdr b + () ;
Complete output from running the input through the generated parser:
--(end of buffer or a NUL)
--accepting rule at line 16 ("set")
--accepting rule at line 18 (" ")
--accepting rule at line 19 ("c")
--accepting rule at line 18 (" ")
--accepting rule at line 15 ("cdr")
Any thoughts? The generated program is giving up after half of the input! (for reference, I'm doing input by redirecting the contents of a file to the generated program).
When generating a lexer that's standalone (that is, not one with tokens that are defined in bison/yacc, you typically write an enum at the top of the file defining your tokens. However, the main loop of a lex program, including the main loop generated by default, looks something like this:
while( token = yylex() ){
...
This is fine, until your lexer matches the rule that appears first in the enum - in this specific case CDR. Since enums by default start at zero, this causes the while loop to end. Renumbering your enum - will solve the issue.
enum tokens{
CDR = 1,
CHARACTER,
SET
};
Short version: when defining tokens by hand for a lexer, start with 1 not 0.
This rule
[-+]?([0-9*\.?[0-9]+|[0-9]+\.)([Ee][-+]?[0-9]+)?
|
seems to be missing a closing bracket just after the first 0-9, I added a | below where I think it should be. I couldn't begin to guess how flex would respond to that.
The rule I usually use for symbol names is [a-zA-Z$_], this is like your unquoted strings
except that I usually allow numbers inside symbols as long as the symbol doesn't start with a number.
[a-zA-Z$_]([a-zA-Z$_]|[0-9])*
A characters is just a short symbol. I don't think it needs to have its own rule, but if it does, then you need to insure that the string rule requires at least 2 characters.
[a-zA-Z$_]([a-zA-Z$_]|[0-9])+