Boost Semantic Actions causing parsing issues

Boost Semantic Actions causing parsing issues - c++

I've been working with the Boost mini compiler example. Here is the root of the source code: http://www.boost.org/doc/libs/1_59_0/libs/spirit/example/qi/compiler_tutorial/mini_c/
The snippet that interests me is in statement_def.hpp
The problem I am having is that if you attach semantic actions, for example like such,
statement_ =
variable_declaration[print_this_declaration]
| assignment
| compound_statement
| if_statement
| while_statement
| return_statement
;
And subsequent run the mini_c compiler on a sample program like:
int foo(n) {
if (n == 3) { }
return a;
}
int main() {
return foo(10);
}
It triggers the "Duplicate Function Error" found within the "compile.cpp" file (found using the above link). Here is that snippet for quick reference:
if (functions.find(x.function_name.name) != functions.end())
{
error_handler(x.function_name.id, "Duplicate function: " + x.function_name.name);
return false;
}
For the life of me, I can't figure out why.
I'm not really sure how to characterize this problem, but it seems that somehow whatever is sent to standard out is being picked up by the parser as valid code to parse (but that seems impossible in this scenario).
Another possibility is the semantic action is somehow binding external data to a symbol table, where it is again considered to be part of the originally-parsed input file (when it shouldn't be).
The last and likely option is that probably I don't fully understand the minutiae of this example (or Boost for that matter), and that somewhere a pointer/reference/Iterator is being shifted to another memory location when it shouldn't (as a result of the SA), throwing the whole mini-compiler into disarray.

[...] it seems that somehow whatever is sent to standard out is being picked up by the parser as valid code to parse
As unlikely as it seemed... it is indeed :) No magic occurs.
Another possibility is the semantic action is somehow binding external data to a symbol table, where it is again considered to be part of the originally-parsed input file (when it shouldn't be).
You're not actually far off here. It's not so much "external" data, though. It's binding uninitialized data to the symbol table. And it actually tries to do that twice.
Step by step:
Qi rules that have semantic actions don't do automatic attribute propagation by default. It is assumed that the semantic action will be in charge of assigning a value to the attribute exposed.
This is the root cause. See documentation: Rule/Expression Semantics
Also: How Do Rules Propagate Attributes
Because of this, the actual attribute exposed by the statement_ rule will be a default-constructed object of type ast::statement:
qi::rule<Iterator, ast::statement(), skipper<Iterator> > statement_;
This type ast::statement is a variant, and a default constructed variant holds a default-constructed object of the first element type:
typedef boost::variant<
variable_declaration
, assignment
, boost::recursive_wrapper<if_statement>
, boost::recursive_wrapper<while_statement>
, boost::recursive_wrapper<return_statement>
, boost::recursive_wrapper<statement_list>
>
statement;
Lo and behold, that object is of type variable_declaration!
struct variable_declaration {
identifier lhs;
boost::optional<expression> rhs;
};
So, each time the statement_ rule matched, the AST will be interpreted as a "declaration of a variable with identifier name """. (Needless to say, the initializer (rhs) is also empty).
The second time this declaration is encountered violates the rule that duplicate names cannot exist in the "symbol table".
HOW TO FIX?
You can explicitly indicate that you want automatic attribute propagation even in the presence of Semantic Actions.
Use operator%= instead of operator= to _assign the rule definition:
statement_ %=
variable_declaration [print_this_declaration]
| assignment
| compound_statement
| if_statement
| while_statement
| return_statement
;
Now, everything will work again.

Related

Ocaml: warning 40 when usage of tag of a variant type in a simple module

I have a simple module in a text file mpd.ml with variants types:
type ack_error =
| Not_list
| Arg
| Password
| Permission
| Unknown
| No_exist
| Playlist_max
| System
| Playlist_load
| Update_already
| Player_sync
| Exist
type response = Ok | Error of (ack_error * int * string * string)
And when I use them :
let test_ok test_ctxt = assert_equal Mpd.Ok (Mpd.parse_response "OK\n")
Even if everything works, I have those warnings:
ocamlfind ocamlc -o test -package oUnit,str -linkpkg -g mpd.ml test.ml
File "test.ml", line 7, characters 2-4:
Warning 40: Ok was selected from type Mpd.response.
It is not visible in the current scope, and will not
be selected if the type becomes unknown.
File "test.ml", line 8, characters 2-7:
Warning 40: Error was selected from type Mpd.response.
It is not visible in the current scope, and will not
be selected if the type becomes unknown.
What does it means and how can I improve my code so that those warnings disapear.
** edit **
full code : https://gist.github.com/cedlemo/8806f367a971bacfaa0f59b1c78a3605

It looks like that you're showing not the line, that provoked the warning. As in the warning it is said, that the Ok constructor is between characters 2-4, but there is nothing alike in your code.
In general, I would suggest to use IDE, like Emacs, Vim, etc, as they will directly jump to the source of the error.
Since, the warning is quite common, I will still explain the reasoning behind it. In OCaml constructors and field names are identifiers, that as well as any other identifier have a scope, and the scope is the module. So, whenever you define a variant type, you are actually defining several constructors in the scope of the module. To reference to the constructor, you need either to use a fully qualified name, or make sure that it is in the scope. If you're in the module, that defines it, then you're ok, otherwise you need to bring the name to the scope somehow.
In previous version of OCaml it was an error, to use a constructor, that is not in the scope. Just a regular unbound identifier. At the latest, the heuristics was added, that infers from which scope the constructor comes. But it is still guarded by a warning, so people is actually trying not to use it. (Digression, I'm wondering why people added a feature, and then momentary disgraced it with a warning, so no one will actually use it).
So, to fix the warning you need to qualify all constructors with the module name, or, alternatively open the module to bring all definitions to the scope, e.g., open Mpd.
Update
So, the code full code discloses that at the line 7, as indeed was pointed by a compiler there is an unqualified constructor:
match response with
| Ok -> false
| Error ...
Here the Ok is unqualified, the correct way is to say:
match response with
| Mpd.Ok -> false
| Mpd.Error ...
The general advice, that describes policy that I use in particular, is to define a module that defines only types, so that you can open it rather safely. This will also solve you a problem of repeating type definitions in .mli as it is considered acceptable to not to have .mli file for a module, that defines only types.

C++ function slash operator lambda expression

in fact I don't know how to be very precise.
Today, I browsed the following page:
http://siliconframework.org/docs/hello_world.html
I found the following syntax:
GET / _hello = [] () { return D(_message = "Hello world."); }
I found "GET" can be a function by lambda expression, but I cannot figure out what does "/" and "_hello" mean here, and how they connect to something meaningful.
Also, what is that "_message = "?
BTW, my primary C++ knowledge is before C++11.
I googled quite a bit.
Could any one kindly give an explanation?

This library uses what is known as an embedded Domain Specific Language, where it warps C++ and preprocessor syntax in ways that allow a seemingly different language to be just another part of a C++ program.
In short, magic.
The first bit of magic lies in:
iod_define_symbol(hello)
which is a macro that generates the identifier _hello of type _hello_t.
It also creates a _hello_t type which inherites from a CRTP helper called iod::symbol<_hello_t>.
_hello_t overrides various operators (including operator= and operator/) in ways that they don't do what you'd normally expect C++ objects to behave.
GET / _hello = [] () { return D(_message = "Hello world."); }
so this calls
operator=(
operator/( GET, _hello ),
/* lambda_goes_here */
);
similarly in the lambda:
D(_message = "Hello world.");
is
D( operator=(_message, "Hello world.") );
operator/ and operator= can do nearly anything.
In the D case, = doesn't do any assigning -- instead, it builds a structure that basically says "the field called "message" is assigned the value "Hello world.".
_message knows it is called "message" because it was generated by a macro iod_define_symbol(message) where they took the string message and stored it with the type _message_t, and created the variable _message which is an instance of that type.
D takes an number of such key/value pairs and bundles them together.
The lambda returns this bundle.
So [] () { return D(_message = "Hello world."); } is a lambda that returns a bundle of key-value pair attachments, written in a strange way.
We then invoke operator= with GET/_hello on the left hand side.
GET is another global object with operator/ overloaded on it. I haven't tracked it down. Suppose it is of type iod::get_t (I made up that name: again, I haven't looked up what type it is, and it doesn't really matter)
Then iod::get_t::operator/(iod::symbol<T> const&) is overloaded to generate yet another helper type. This type gets the T's name (in this case "hello"), and waits for it to be assigned to by a lambda.
When assigned to, it doesn't do what you expect. Instead, it goes off and builds an association between "hello" and invoking that lambda, where that lambda is expected to return a set of key-value pairs generated by D.
We then pass one or more such associations to http_api, which gathers up those bundles and builds the data required to run a web server with those queries and those responses, possibly including flags saying "I am going to be an http server".
sl::mhd_json_serve then takes that data, and a port number, and actually runs a web server.
All of this is a bunch of layers of abstraction to make some reflection easier. The structures generated both have C++ identifiers, and similar strings. The similar strings are exposed in them, and when the json serialization (or deserialization) code is generated, those strings are used to read/write the json values.
The macros merely exist to make writing the boilerplate easier.
Techniques that might be helpful to read on further include "expression templates", "reflection", "CRTP", embedded "Domain Specific Language"s if you want to learn about what is going on here.
Some of the above contains minor "lies told to children" -- in particular, the operator syntax doesn't work quite like I implied. (a/b is not equivalent to operator/(a,b), in that the second won't call member operator /. Understanding that they are just functions is what I intend, not that the syntax is the same.)
#mattheiuG (the author of this framework) has shared these slides in a comment below this post that further explains D and the _message tokens and the framework.

It's not standard C++ syntax, it's framework specific instead. The elements prefixed with an underscore (_hello, _message etc) are used with a symbol definition generator that runs and creates the necessary definitions prior to compilation.
There's some more information on it on the end of this page: http://siliconframework.org/docs/symbols.html. Qt does a similar thing with its moc tool.

How can I print whatever I see in Yacc/Bison?

I have a complicated Yacc file with a bunch of rules, some of them complicated, for example:
start: program
program: extern_list class
class: T_CLASS T_ID T_LCB field_dec_list method_dec_list T_RCB
The exact rules and the actions I take on them are not important, because what I want to do seems fairly simple: just print out the program as it appears in the source file, using the rules I define for other purposes. But I'm surprised at how difficult doing so is.
First I tried adding printf("%s%s", $1, $2) to the second rule above. This produced "��#P�#". From what I understand, the parsed text is also available as a variable, yytext. I added printf("%s", yytext) to every rule in the file and added extern char* yytext; to the top of the file. This produced (null){void)1133331122222210101010--552222202020202222;;;;||||&&&&;;;;;;;;;;}}}}}}}} from a valid file according to the language's syntax. Finally, I changed extern char* yytext; to extern char yytext[], thinking it would not make a difference. The difference in output it made is best shown as a screenshot
I am using Bison 3.0.2 on Xubuntu 14.04.

If you just want to echo the source to some output while parsing it, it is easiest to do that in the lexer. You don't say what you ware using for a lexer, but you mention yytext, which is used by lex/flex, so I will assume that.
When you use flex to recognize tokens, the variable yytext refers to the internal buffer flex uses to recognize tokens. Within the action of a token, it can be used to get the text of the token, but only temporarily -- once the action completes and the next token is read, it will no longer be valid.
So if you have a flex rule like:
[a-zA-Z_][a-zA-Z_0-9]* { yylval.str = yytext, return T_ID; }
that likely won't work at all, as you'll have dangling pointers running around in your program; probably the source of the random-looking outputs you're seeing. Instead you need to make a copy. If you also want to output the input unchanged, you can do that here too:
[a-zA-Z_][a-zA-Z_0-9]* { yylval.str = strdup(yytext); ECHO; return T_ID; }
This uses the flex macro ECHO which is roughly equivalent to fputs(yytext, yyout) -- copying the input to a FILE * called yyout (which defaults to stdout)

If the first symbol in the corresponding right-hand side is a terminal, $1 in a bison action means "the value of yylval produced by the scanner when it returned the token corresponding to that terminal. If the symbol is a non-terminal, then it refers to the value assigned to $$ during the evaluation of the action which reduced that non-terminal. If there was no such action, then the default $$ = $1 will have been performed, so it will pass through the semantic value of the first symbol in the reduction of that non-terminal.
I apologize if all that was obvious, but your snippet is not sufficient to show:
what the semantic types are for each non-terminal;
what the semantic types are for each terminal;
what values, if any, are assigned to yylval in the scanner actions;
what values, if any, are assigned to $$ in the bison actions.
If any of those semantic types are not, in fact, character strings, then the printf will obviously produce garbage. (gcc might be able to warn you about this, if you compile the generated code with -Wall. Despite the possibility of spurious warnings if you are using old versions of flex/bison, I think it is always worthwhile compiling with -Wall and carefully reading the resulting warnings.)
Using yytext in a bison action is problematic, since it will refer to the text of the last token scanned, typically the look-ahead token. In particular, at the end of the input, yytext will be NULL, and that is what you will pick up in any reductions which occur at the end of input. glibc's printf implementation is nice enough to print (null) instead of segfaulting when your provide (char*)0 to an argument formated as %s, but I don't think it's a great idea to depend on that.
Finally, if you do have a char* semantic value, and you assign yylval = yytext (or yylval.sval = yytext; if you are using unions), then you will run into another problem, which is that yytext points into a temporary buffer owned by the scanner, and that buffer may have completely different contents by the time you get around to using the address. So you always need to make a copy of yytext if you want to pass it through to the parser.
If what you really want to do is see what the parser is doing, I suggest you enable bison's yydebug parser-trace feature. It will give you a lot of useful information, without requiring you to insert printf's into your bison actions at all.

Remove duplication of enumerated elements

I have the following enumerator and it's likely to be expanded over the course of program development:
enum myEnum {
Element1,
Element2,
Element3
...
ElementX
Last
};
I have a function that uses the enumerator in the following way:
bool CheckEnumValidity(myEnum a)
{
bool valid = false;
switch (a) {
case Element1:
case Element2:
case Element3:
case ...
case ElementX:
valid true;
break;
case Last:
valid false;
break;
};
return valid;
}
QUESTIONS:
1) I duplicate Element1, Element2 etc. in two places in my program. How to get rid of the duplication in the safest way?
2) Should I have default behavior that throws an exception (or return false) in the aforementioned switch statement given that CheckEnumValidity() has an argument of myEnum type?
NOTES:
C++ 11 is unavailable for my application.

Provided that your enum really doesn't contain any explicit value assignment then you can write:
if (a <= Last) {
return (a < Last);
} else {
throw AnyExceptionYouWant();
}

It would probably be easier, through coding guidelines, peer pressure, policy enforcement (sack any programer who does not comply with the coding guideline) or other means to ensure that calling code which uses your enum only ever supplies named values.
In other words, disallow conversion of an integral value to an enumerated type. After all, doing such things negates most of the reason for using an enumerated type in the first place.
If, despite this suggestion, you want to test, I'd write a little program that parses your header files, finds all the enum types, and automatically generates your SomeFunction(). With makefiles, it is easy to ensure that program is run whenever relevant header files change, which means the function would be updated, recompiled, and linked into your program to keep the check consistent with the type definition.
As to whether your check function should throw an exception, that comes down to what the consequences of a value failing the test are. If your program should really not continue, then throw an exception. If the error is benign and your program can continue somehow, simply log an error message (e.g. to std::cerr) and continue.

To answer your first question, there is no very straightforward way to do this in C++, though I will leave a comment by your question pointing to some approaches.
For your second question, I recommend you use a default case. Here is why. The first reason is weaker, but the last two are stronger.
Someone may convert an integer explicitly to an enumerated value without checking that it is valid. This should be forbidden, but it still sometimes happens, and you should catch this programming error at run time if it was missed in code review.
You may read a struct or other data from an untrusted external source, where that struct contains an enum field, and forget to properly validate it. The untrusted external source could even be a file saved with an older version of your program, where the enum had a different set of valid values.
You may have an uninitialized enum somewhere.
Even something as simple as this:
enum A {X = 1, Y, Z};
int main()
{
A foo;
switch (foo) {
case X: return 0;
case Y: return 1;
case Z: return 2;
}
}
As to what you should do in the default case, it depends on your project and the specific enum. For example, if enums should always be validated before entering the bulk of your program, thus preventing invalid values, and it's okay to fail if this is violated, then you should probably throw an exception or even call exit, after printing a suitable error message – this is a programming failure caught at run time.
If failing like this is not an option, you should probably at least still try to log it, at least in a debug build, so you can detect the problem.
If invalid values make sense for a particular enum, then handle it as you see fit for that enum according to why it makes sense.

What happens on error - Bison

imagine this grammar:
declaration
: declaration_specifiers ';' { /* allocate AST Node and return (1) */}
| declaration_specifiers init_declarator_list ';' { /* allocate AST Node and return (2)*/}
;
init_declarator_list
: init_declarator { /* alloc AST Node and return (3) */}
| init_declarator_list ',' init_declarator { /* allocate AST Node and return (4) */}
;
now imagine there is a error in the ',' token. So we have so far:
declaration -> declaration_specifiers init_declarator_list -> init_declarator_list ',' /*error*/
What happens here?
Does bison execute (4) code? and (2)? If bison does not execute (4) but it does execute (2) what is $3 value ? how can i set a default value for $variables?
How can i delete my AST generated on error properly?

bison only executes an action when the action's production is reduced, which means that it must have exactly matched the input, unless it is an error production in which case a relaxed matching form is used. (See below.) So you can be assured that if an action is performed, then the various semantic values associated with its terminals and non-terminals are the result of the lexer or their respective actions.
During error recovery, however, bison will automatically discard semantic values from the stack. With reasonably recent bison versions, you can specify an action to be performed when a value is discarded using the %destructor declaration. (See the bison manual for details.) You can specify a destructor either by type or by symbol (or both, but the per-symbol destructor takes precedence.)
The %destructor action will be run whenever bison discards a semantic value. Roughly speaking, discarding a semantic value means that your program never had a chance to deal with the semantic value. It does not apply to values popped off the stack when a production is reduced, even if there is no explicit action associated with the reduction. A complete definition of "discarded" is at the end of the bison manual section cited earlier.
Without error productions, there is really not much possible in the way of error recovery other than discarding the entire stack and any lookahead symbols (which bison will do automatically) and then terminating the parse. You can do a bit better by adding error productions to your grammar. An error production includes the special token error; this token matches an empty sequence precisely in the case that there is no other possible match. Unlike normal productions, error productions do not need to be immediately visible; bison will discard states (and corresponding values) from the stack until it finds a state with an error transition, or it reaches the end of the stack. Also, the terminal following error in the error production does not need to be the lookahead token; bison will discard lookahead tokens (and corresponding values) until it is able to continue with the error production (or it reaches the end of the input). See the handy manual for a longer description of the process (or read about it in the Dragon book, if you have a copy nearby).

There are several questions here.
Bison detects an error by being in a parse state in which there is no action (shift or reduce) for the current lookahead token. In your example that would be in the state after shifting the ',' in init_declarator_list. In that state, only tokens in FIRST(init_declarator) will be valid, so any other token will cause an error.
Actions in the bison code will be executed when the corresponding rule is reduced, so action (4) will never be called -- it never got far enough to reduce that rule. Action (3) will run when that rule was reduced, which happened before it shifted the , to the state where the error was detected.
After having an error (and calling yerror with an error message), the parser will attempt to recover by popping states off the stack, looking for one in which the special error token can be shifted. As it pops and discards states, it will call the %destructor action for symbols corresponding to those states, so you can use that to clean up things (free memory) if needed.
In your case, it looks like there are no error rules, so no states in which an error token can be shifted. So it will pop all states, and then return failure from yyparse. If it does find a state that can shift an error, it stop popping there and shift the error token, and attempt to continue parsing in error recovery mode. While in error recovery mode, it counts how many tokens (other than the error token) it has shifted since it last had an error. If it has shifted fewer than 3 tokens before hitting another error, it will not call yyerror for the new error. In addition, if it has shifted 0 tokens, it will try to recover from the error by reading and throwing away input tokens (instead of popping states) until it finds one that can be handled by the current state. As it discards tokens, it calls the %destructor for those tokens, so again you can clean up anything that needs cleaning.
So to answer you last question, you can use a %destructor declaration to delete stuff when an error occurs. The %destructor is called exactly once for each item that is discarded without being passed to a bison action. Items that are passed to actions (as $1, $2, ... in the action) will never have the %destructor called for them, so if you don't need them after the action, you should delete them there.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js