Bison Grammar: yylval is embedded in yyparse

Bison Grammar: yylval is embedded in yyparse - c++

No wonder i cant link to it from my flex file.
I have checked this and taken out the declaration "YYSTYPE yylval;" from the beginning of yyparse and it works as intended. Surely this is not the correct way to use bison and flex? Can somebody show me another way?
Thank you.

It is normal that yylval is declared and defined in the y.tab.c file output by bison. Its also declared (as extern) in the y.tab.h file, so if you include that in your lexer, you can access yylval as a global var. This is the normal way in which flex/bison works and there should be no need to edit the files to take out things -- it should 'just work'
This use of a global var causes problems if you want to have more than one parser in a program, or want to use multiple parsers in different threads (or otherwise simultaneously). Bison provides a way to avoid this with %define api.pure, which gets rid of yylval as a global -- instead the parser will call yylex with the address of a YYSTYPE (a pointer) and the lexer should put the token value there instead of in yylval. If you're using flex, you'll want #define YY_DECL int yylex(YYSTYPE *val) in the top of your flex file to change the declaration it uses for yylex.

Instead of using
#define YY_DECL int yylex(YYSTYPE *val)
you can also use
%option bison-bridge
But if you want to write a flex+bison parser in C++, then this method does not work.
For C++ parsers, check this example out.

I have checked this and taken out the
declaration "YYSTYPE yylval;"
I wonder if there is something wrong with your "taken out", but you could try
bison -d your-yacc-file.y
then bison will generate a header file for you with all those declarations.

Related

Flex C++ - #ifdef inside flex block

I want to define constant in preprocessor which launches matching some patterns only when it's defined. Is it possible to do this, or there is the other way how to deal with this problem?
i.e. simplified version of removing one-line comments in C:
%{
#define COMMENT
%}
%%
#ifdef COMMENT
[\/][\/].*$ ;
#endif
[1-9][0-9]* printf("It's a number, and it works with and without defining COMMENT");
%%

There is no great solution to this (very reasonable) request, but there are some possibilities.
(F)lex start conditions
Flex start conditions make it reasonably simple to define a few optional patterns, but they don't compose well. This solution will work best if you have only a single controlling variable, since you will have ti define a separate start condition for every possible combination of controlling variables.
For example:
%s NO_COMMENTS
%%
<NO_COMMENTS>"//".* ; /* Ignore comments in `NO_COMMENTS mode. */
The %s declaration means that all unmarked rules also apply to the N_COMMENTS state; you will commonly see %x ("exclusive") in examples, but that would force you to explicitly mark almost every rule.
Once you have modified you grammar in this way, you can select the appropriate set of rules at run-time by setting the lexer's state with BEGIN(INITIAL) or BEGIN(NO_COMMENTS). (The BEGIN macro is only defined in the flex generated file, so you will want to export a function which performs one of these two actions.)
Using cpp as a utility.
There is no preprocessor feature in flex. It's possible that you could use a C preprocessor to preprocess your flex file before passing it to flex, but you will have to be very careful with your input file:
The C preprocessor expects its input to be a sequence of valid C preprocessor tokens. Many common flex patterns will not match this assumption, because of the very different quoting rules. (For a simple example, a common pattern to recognise C comments includes the character class [^/*] which will be interpreted by the C preprocessor as containing the start of a C comment.)
The flex input file is likely to have a number of lines which are valid #include directives. There is no way to avoid these directives from being expanded (other than removing them from the file). Once expanded and incorporated into the source, the header files no longer have include guards, so you will have to tell flex not to insert any #include files from its own templates. I believe that is possible, but it will be a bit fragile.
The C preprocessor may expand what looks to it like a macro invocation.
The C preprocessor might not preserve linear whitespace, altering the meaning of the flex scanner definition.
m4 and other preprocessors
It would be safer to use m4 as a preprocessor, but of course that means learning m4. ( You shouldn't need to install it because flex already depends on it. So if you have flex you also have m4.) And you will still need to be very careful with quoting sequences. M4 lets you customize these sequences, so it is more manageable than cpp. But don't copy the common idiom of defining [[ as a quote delimiter; it is very common inside regular expressions.
Also, m4 does not insert #line directives and any non-trivial use will change the number of input lines, making error messages harder to interpret. (To say nothing of the challenge of debugging.) You can probably avoid this issue in this very simple case but the issue will reappear.
You could also write your own simple preprocessor, but you will still need to address the above issues.

Using _EXFUN () in C

https://www.gnu.org/software/m68hc11/examples/stdio_8h-source.html
I wonder what does _exfun () mean above link in using C/C++ and what does it do? I have seen a code fragment. Here is:
#ifndef _EXFUN
# define _EXFUN(N,P) N P
#endif
What does this code do? If you explain that, i will be so happy. Thanks

I just tested it and it chains two statements.
_EXFUN(printf("Hello, ");, printf("World!\n");)
Apparently, this is a hack used so that the same statement will be preprocessed differently depending on how _EXFUN is implemented in the header file. It mainly has to do with declarations. But I can not give any example of when it is useful.

The macro is used to overcome the definition on the functions mainly in C headers (.h files). This ensure the best support for the compilation whatever can be the platform and the version of the compiler (C ANSI compatible or plain C).
Then int _EXFUN(scanf, (const char *, ...)); resolves as int scanf(const char *, ...).

In C/C++ anything that starts with a # is a precompiler flag. This meaning that before the program is compiled, a program parses the lines that start with # and do what ever operation they are supposed to do. #if, #ifdef, #ifndef only include the code within their if block so long as the condition is met, meaning if the condition is not met, the code in the if is not compiled. #define is used for macro definition and simple text replacement.
It seems that in the code you reference, if the macro is not already defined, the macro will be defined so that the code below the definiton of the macro will be turned into valid C syntax. I suppose this could be useful if somebody wanted to have the stdio functions do something else or be defined a different way. They would achieve that be defining_EXFUN before stdio.h was included.

How to use flex with my own parser?

I want to leave the lexical analysis to lex but develop the parser on my own.
I made a token.h header which has the enums for token types and a simple class hierarchy,
For the lex rule:
[0-9]+ {yylval = new NumToken(std::stoi(yytext));return NUM;}
How do I get the NumToken pointer from the parser code?
Suppose I just want to print out the tokens..
while(true)
{
auto t = yylex();
//std::cout <<yylval.data<<std::endl; // What goes here ?
}
I can do this with yacc/bison, but can not find any documentation or example about how to do this manually.

In a traditional bison/flex parser, yylval is a global variable defined in the parser generated by bison, and declared in the header file generated by bison (which should be #include'd into the generated scanner). So a simple solution would be just to replicate that: declare yylval (as a global) in token.h and define it somewhere in your parser.
But modern programming style has shifted away from the use of globals (for good reason), and indeed even flex will generate scanners which do not depend on global state, if requested. To request such a scanner, specify
%option reentrant
in your scanner definition. By default, this changes the prototype of yylex to:
int yylex(yyscan_t yyscanner);
where yyscan_t is an opaque pointer. (This is C, so that means it's a void*.) You can read about the details in the Flex manual; the most important takeaway is that you can ask flex to also generate a header file (with %option header-file), so that other translation units can refer to the various functions for creating, destroying and manipulating a yyscan_t, and that you need to minimally create one so that yylex has somewhere to store its state. (Ideally, you would also destroy it.) [Note 1].
The expected way to use a reentrant scanner from bison is to enable %option bison-bridge (and %option bison-location if your lexer generates source location information for each token). This will add an additional parameter to the yylex prototype:
int yylex(YYSTYPE *yylval_param, yyscan_t scanner);
With `%option bison-locations', two parameters are added:
int yylex(YYSTYPE *yylval_param,
YYLTYPE *yylloc_param,
yyscan_t scanner);
The semantic type YYSTYPE and the location type YYLTYPE are not declared by the flex-generated code. They must appear in the token.h header you #include into your scanner.
The intention of the bison-bridge parameters is to provide a mechanism to return the semantic value yylval to the caller (i.e. the parser). Since yylval is effectively the same as the parameter yylval_param [Note 2], it will be a pointer to the actual semantic value, so you need to write (for example) yylval->data = ... in your flex actions.
So that's one way to do it.
A possibly simpler alternative to bison-bridge is just to provide your own yylex prototype, which you can do with the macro YY_DECL. For example, you could do something like this (if YYSTYPE were something simple):
#define YY_DECL std::pair<int, YYSTYPE> yylex(yyscan_t yyscanner)
Then a rule could just return the pair:
[0-9]+ {return std::make_pair(NUM, new NumToken(std::stoi(yytext));}
Obviously, there are many variants on this theme.
Notes
Unfortunately, the generated header includes quite a lot of unnecessary baggage, including a bunch of macro definitions for the standard "globals" which won't work because in a reentrant scanner these variables can only be used in a flex action.
The scanner generated with bison-bridge defines yylval as a macro which refers to a field in the opaque state structure, and stores yylval_param into this field. yyget_lval and yyset_lval functions are provided in order to get or set this field from outside of yylex. I don't know why; it seems somewhere between unnecessary and dangerous, since the state will contain the pointer to the value, as supplied in the call to yylex, which may well be a dangling pointer once the call returns.

multiple parsers using C++ api in bison, conflicts with stack.hh

When you run bison, it creates a stack class for you in "stack.hh". The file name is fixed, but the contents are wrapped in a namespace of your choosing.
If you use bison to generate 2 separate grammars (ie 2 *.y files) and you use the C++ mode, the "stack.hh" files conflict and get overwritten.
A similar thing happens for the "location.hh" and "position.hh" classes that are autogenerated, but there is a work around in bison 2.7
%define api.location.type "foo::location"
that lets you reuse the foo grammar namespace in your bar grammar namespace.
But I can't find anyway of doing this exercise when dealing with the "stack.hh" file.

The easiest way to deal with this is just to put the Bison files in two separate directories. Then when you generate the code the files will not conflict, assuming each set of files gets generated in the same location as the corresponding Bison file.

Passing the caller FILE LINE to a function without using macro

I'm used to this:
class Db {
_Commit(char *file, int line) {
Log("Commit called from %s:%d", file, line);
}
};
#define Commit() _Commit(__FILE__, __LINE__)
but the big problem is that I redefine the word Commit globally, and in a 400k lines application framework it's a problem. And I don't want to use a specific word like DbCommit: I dislike redundancies like db->DbCommit(), or to pass the values manually everywhere: db->Commit(__FILE__, __LINE__) is worst.
So, any advice?

So, you're looking to do logging (or something) with file & line info, and you would rather not use macros, right?
At the end of the day, it simply can't be done in C++. No matter what mechanism you chose -- be that inline functions, templates, default parameters, or something else -- if you don't use a macro, you'll simply end up with the filename & linenumber of the logging function, rather than the call point.
Use macros. This is one place where they are really not replaceable.
EDIT:
Even the C++ FAQ says that macros are sometimes the lesser of two evils.
EDIT2:
As Nathon says in the comments below, in cases where you do use macros, it's best to be explicit about it. Give your macros macro-y names, like COMMIT() rather than Commit(). This will make it clear to maintainers & debuggers that there's a macro call going on, and it should help in most cases to avoid collisions. Both good things.

Wait till C++20, you cal use source_location
https://en.cppreference.com/w/cpp/utility/source_location

You can use a combination of default parameter and preprocessor trick to pass the caller file to a functions. It is the following:
Function declaration:
static const char *db_caller_file = CALLER_FILE;
class Db {
_Commit(const char *file = db_caller_file) {
Log("Commit called from %s", file);
}
};
Declare db_caller_file variable in the class header file.
Each translation unit will have a const char *db_caller_file. It is static, so it will not interfere between translation units. (No multiple declarations).
Now the CALLER_FILE thing, it is a macro and will be generated from gcc's command line parameters. Actually if using automated Make system, where there is generic rule for source files, it is a lot easier: You can add a rule to define macro with the file's name as a value. For example:
CFLAGS= -MMD -MF $(DEPS_DIR)/$<.d -Wall -D'CALLER_FILE="$<"'
-D defines a macro, before compiling this file.
$< is Make's substitution for the name of the prerequisite for the rule, which in this case is the name of the source file. So, each translation unit will have it's own db_caller_file variable with value a string, containing file's name.
The same idea cannot be applied for the caller line, because each call in the same translation unit should have different line numbers.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js