How to do proper error handling in BNFC? (C++, Flex, Bison) - c++

I'm making a compiler in BNFC and it's got to a stage where it already compiles some stuff and the code works on my device. But before shipping it, I want my compiler to return proper error messages when the user tries to compile an invalid program.
I found how bison can write error on the stderr stream and I'm able to catch those. Now suppose the user's code has no syntax error, it just references an undefined variable, I'm able to catch this in my visitor, but I can't know what the line number was, how can I find the line number?

In bison you can access the starting and ending position of the current expression using the variable #$, which contains a struct with the members first_column, first_line, last_column and last_line. Similarly #1 etc. contain the same information for the sub-expressions $1 etc. respectively.
In order to have access to the same information later, you need to write it into your ast. So add a field to your AST node types to store the location and then set that field when creating the node in your bison file.

(previous answer is richer) but in some simple parsers if we declare
%option yylineno
in flex, and print it in yyerror,
yyerror(char *s) {
fprintf(stderr,"ERROR (line %d):before '%s'\n-%s",yylineno, yytext,s);
}
sometimes it help...

Related

Bison : Line number included in the error messages

OK, so I suppose my question is quite self-explanatory.
I'm currently building a parser in Bison, and I want to make error reporting somewhat better.
Currently, I've set %define parse.error verbose (which actually gives messages like syntax error, unexpected ***********************, expecting ********************.
All I want is to add some more information in the error messages, e.g. line number (in input/file/etc)
My current yyerror (well nothing... unusual... lol) :
void yyerror(const char *str)
{
fprintf(stderr,"\x1B[35mInterpreter : \x1B[37m%s\n",str);
}
P.S.
I've gone through the latest Bison documentation, but I seem quite lost...
I've also had a look into the %locations directive, which most likely is very close to what I need - however, I still found no complete working example and I'm not sure how this is to be used.
So, here I'm a with a step-by-step solution :
We add the %locations directive in our grammar file (between %} and the first %%)
We make sure that our lexer file contains an include for our parser (e.g. #include "mygrammar.tab.h"), at the top
We add the %option yylineno option in our lexer file (between %} and the first %%)
And now, in our yyerror function (which will supposedly be in our lexer file), we may freely use this... yylineno (= current line in file being processed) :
void yyerror(const char *str)
{
fprintf(stderr,"Error | Line: %d\n%s\n",yylineno,str);
}
Yep. Simple as that! :-)
Whats worked for me was adding extern int yylineno in .ypp file:
/* parser.ypp */
%{
extern int yylineno;
%}
/* scanner.lex */
...
%option yylineno
Bison ships with a number of examples to demonstrate its features, see /usr/local/share/doc/bison/examples on your machine (where the prefix /usr/local depends on your configuration.
These examples in particular might be of interest to you:
lexcalc uses precedence directives and location tracking. It uses Flex to generate the scanner.
bistromathic demonstrates best practices when using Bison.
Its hand-written scanner tracks locations.
Its interface is pure.
It uses %params to pass user information to the parser and scanner.
Its scanner uses the error token to signal lexical errors and enter
error recovery.
Its interface is "incremental", well suited for interaction: it uses the
push-parser API to feed the parser with the incoming tokens.
It features an interactive command line with completion based on the
parser state, based on yyexpected_tokens.
It uses Bison's standard catalog for internationalization of generated
messages.
It uses a custom syntax error with location, lookahead correction and
token internationalization.
Error messages quote the source with squiggles that underline the error:
> 123 456
1.5-7: syntax error: expected end of file or + or - or * or / or ^ before number
1 | 123 456
| ^~~
It supports debug traces with semantic values.
It uses named references instead of the traditional $1, $2, etc.

Put the code generated by flex to a normal C++ program

I create a simple file, using flex, it generate a file lex.yy.c, for now, I want to put it to C++ program.
%{
#include < stdio.h>
%}
%%
stop printf("Stop command received\n");
start printf("Start command received\n");
%%
When I type start or stop in command line, there is a output. What I want to do is to give the input by my C++ program, and the output of it should be sent to a variable in my program, is it possible? Thanks a lot!
I know the code I post is quite simple, but the result I imagine is:
create c file by flex and bison, and I use it as a header, so in the c++ program, I just need to call a function lex_yacc() to use it. ex. lex_yacc() is a calculator, so I sent an expression with evariables to this function, and it will return the result. I want to use this function in a C++ program, I am confused...Thanks a lot!
See the section about multiple input buffers in the manual. Especially the section about yy_scan_string and yy_scan_bytes.
For the "output", of course the is "output" when you give "stop" or "start" as input, you explicitly do that (i.e. the printf calls). You can put any code you want there.

Compile a program with local file embedded as a string variable?

Question should say it all.
Let's say there's a local file "mydefaultvalues.txt", separated from the main project. In the main project I want to have something like this:
char * defaultvalues = " ... "; // here should be the contents of mydefaultvalues.txt
And let the compiler swap " ... " with the actual contents of mydefaultvalues.txt. Can this be done? Is there like a compiler directive or something?
Not exactly, but you could do something like this:
defaults.h:
#define DEFAULT_VALUES "something something something"
code.c:
#include "defaults.h"
char *defaultvalues = DEFAULT_VALUES;
Where defaults.h could be generated, or otherwise created however you were planning to do it. The pre-processor can only do so much. Making your files in a form that it will understand will make things much easier.
The trick I did, on Linux, was to have in the Makefile this line:
defaultvalues.h: defaultvalues.txt
xxd -i defaultvalues.txt > defaultvalues.h
Then you could include:
#include "defaultvalues.h"
There is defined both unsigned char defaultvalues_txt[]; with the contents of the file, and unsigned int defaultvalues_txt_len; with the size of the file.
Note that defaultvalues_txt is not null-terminated, thus, not considered a C string. But since you also have the size, this should not be a problem.
EDIT:
A small variation would allow me to have a null-terminated string:
echo "char defaultvalues[] = { " `xxd -i < defaultvalues.txt` ", 0x00 };" > defaultvalues.h
Obviously will not work very well if the null character is present inside the file defaultvalues.txt, but that won't happen if it is plain text.
One way to achieve compile-time trickery like this is to write a simple script in some interpreted programming language(e.g. Python, Ruby or Perl will do great) which does a simple search and replace. Then just run the script before compiling.
Define your own #pramga XYZ directive which the script looks for and replaces it with the code that declares the variable with file contents in a string.
char * defaultvalues = ...
where ... contains the text string read from the given text file. Be sure to compensate for line length, new lines, string formatting characters and other special characters.
Edit: lvella beat me to it with far superior approach - embrace the tools your environment supplies you. In this case a tool which does string search and replace and feed a file to it.
Late answer I know but I don't think any of the current answers address what the OP is trying to accomplish although zxcdw came really close.
All any 7 year old has to do is load your program into a hex editor and hit CTRL-S. If the text is in your executable code (or vicinity) or application resource they can find it and edit it.
If you want to prevent the general public from changing a resource or static data just encrypt it, stuff it in a resource then decrypt it at runtime. Try DES for something small to start with.

Determing line number and file name of the perl file from within C++

I am working with Perl embedded in our application. We have installed quite a few C++ functions that are called from within Perl. One of them is a logging function. I would like to add the file name and line number of the Perl file that called this function to the log message.
I know on the Perl side I can use the "caller()" function to get this information, but this function is already used in hundreds of locations, so I would prefer to modify the C++ side, is this information passed to the C++ XSUB functions and if so how would I get at it?
Thanks.
This should work:
char *file;
I32 line;
file = OutCopFILE(PL_curcop);
line = CopLINE(PL_curcop);
Control ops (cops) are one of the two ops OP_NEXTSTATE and op_DBSTATE,
that (loosely speaking) are separate statements.
They hold information important for lexical state and error reporting.
At run time, PL_curcop is set to point to the most recently executed cop,
and thus can be used to determine our current state.
— cop.h
Can't you call perl builtins from XS? I confess I don't know.
If not, you could always do something like this:
sub logger { _real_logger(caller, #_) }
assuming logger is what your function is called (and you rename your C++ XS function to _real_logger. You could also do this, presumably, if you need to hide yourself in the call tree:
sub logger {
unshift #_, caller;
goto &_real_logger;
}
which is of course the normal form of goto used in AUTOLOAD.
These will add overhead, of course, but probably not a big deal for a logging function.

How to tell flex and bison to stop processing input?

What is the best way to flex and bison to stop processing when an error is encountered. If I call yyerror, it does not stop scanning and parsing my file. While the input is syntactically correct, there is an user error, such as they tried to load the same file twice. Once I am out of flex/bison, then my program will return an error to the user and the program should keep running. I assume that throwing a C++ exception would probably break something?
YYABORT is the standard way of getting out; it causes yyparse to return immediately with a failure (1). You can then throw an exception or do whatever you want. You'll need to reset flex's input if you want to parse something else, but if you do, you can just call yyparse again and parsing will start over from the beginning.
YYACCEPT stop parse and return 0.