Boost Qi Composing rules using Functions - c++

I'm trying to define some Boost::spirit::qi parsers for multiple subsets of a language with minimal code duplication. To do this, I created a few basic rule building functions. The original parser works fine, but once I started to use the composing functions, my parsers no longer seem to work.
The general language is of the form:
A B: C
There are subsets of the language where A, B, or C must be specific types, such as A is an int while B and C are floats. Here is the parser I used for that sub language:
using entry = boost::tuple<int, float, float>;
template <typename Iterator>
struct sublang : grammar<Iterator, entry(), ascii::space_type>
{
sublang() : sublang::base_type(start)
{
start = int_ >> float_ >> ':' >> float_;
}
rule<Iterator, entry(), ascii::space_type> start;
};
But since there are many subsets, I tried to create a function to build my parser rules:
template<typename AttrName, typename Value>
auto attribute(AttrName attrName, Value value)
{
return attrName >> ':' >> value;
}
So that I could build parsers for each subset more easily without duplicate information:
// in sublang
start = int_ >> attribute(float_, float_);
This fails however and I'm not sure why. In my clang testing, parsing just fails. In g++, it seems the program crashes.
Here's the full example code: http://coliru.stacked-crooked.com/a/8636f19b2e9bff8d
What is wrong with the current code and what would be the correct approach for this problem? I would like to avoid specifying the grammar of attributes and other elements in each sublanguage parser.

Quite simply: using auto with Spirit (or any EDSL based on Boost Proto and Boost Phoenix) is most likely Undefined Behaviour¹
Now, you can usually fix this using
BOOST_SPIRIT_AUTO
boost::proto::deep_copy
the new facility that's coming in the most recent version of Boost (TODO add link)
In this case,
template<typename AttrName, typename Value>
auto attribute(AttrName attrName, Value value) {
return boost::proto::deep_copy(attrName >> ':' >> value);
}
fixes it: Live On Coliru
Alternatively
you could use qi::lazy[] with inherited attributes.
I do very similar things in the prop_key rule in Reading JSON file with C++ and BOOST.
you could have a look at the Keyword List Operator from the Spirit Repository. It's designed to allow easier construction of grammars like:
no_constraint_person_rule %=
kwd("name")['=' > parse_string ]
/ kwd("age") ['=' > int_]
/ kwd("size") ['=' > double_ > 'm']
;
This you could potentially combine with the Nabialek Trick. I'd search the answers on SO for examples. (One is Grammar balancing issue)
¹ Except for entirely stateless actors (Eric Niebler on this) and expression placeholders. See e.g.
Assigning parsers to auto variables
undefined behaviour somewhere in boost::spirit::qi::phrase_parse
C++ Boost qi recursive rule construction
boost spirit V2 qi bug associated with optimization level
Some examples
Define parsers parameterized with sub-parsers in Boost Spirit
Generating Spirit parser expressions from a variadic list of alternative parser expressions

Related

Boost Spirit Qi: Compile error on slight rule change

I'm writing a little compiler just for fun and I'm using Boost Spirit Qi to describe my grammar. Now I want to make a minor change in the grammar to prepare some further additions. Unfortunately these changes won't compile and I would like to understand why this is the case.
Here is a snippet from the code I want to change. I hope the provided information is enough to understand the idea. The complete code is a bit large, but if you want to look at it or even test it (Makefile and Travis CI is provided), see https://github.com/Kruecke/BFGenerator/blob/8f66aa5/bf/compiler.cpp#L433.
typedef boost::variant<
function_call_t,
variable_declaration_t,
variable_assignment_t,
// ...
> instruction_t;
struct grammar : qi::grammar<iterator, program_t(), ascii::space_type> {
grammar() : grammar::base_type(program) {
instruction = function_call
| variable_declaration
| variable_assignment
// | ...
;
function_call = function_name >> '(' > -(variable_name % ',') > ')' > ';';
// ...
}
qi::rule<iterator, instruction::instruction_t(), ascii::space_type> instruction;
qi::rule<iterator, instruction::function_call_t(), ascii::space_type> function_call;
// ...
};
So far, everything is just working fine. Now I want to move the parsing of the trailing semicolon (> ';') from the function_call rule to the instruction rule. My code now looks like this:
struct grammar : qi::grammar<iterator, program_t(), ascii::space_type> {
grammar() : grammar::base_type(program) {
instruction = (function_call > ';') // Added trailing semicolon
| variable_declaration
| variable_assignment
// | ...
;
// Removed trailing semicolon here:
function_call = function_name >> '(' > -(variable_name % ',') > ')';
// ...
}
From my understanding the rules haven't really changed because the character parser ';' doesn't yield any attribute and so it shouldn't matter where this parser is positioned. However, this change won't compile:
/usr/include/boost/spirit/home/support/container.hpp:278:13: error: no matching function for call to ‘std::basic_string<char>::insert(std::basic_string<char>::iterator, const bf::instruction::function_call_t&)’
c.insert(c.end(), val);
^
(This error comes from the instruction = ... line.)
Why is this change not compiling? I'm rather looking for an explanation to understand what's going on than a workaround.
Ok, so after looking at this closely, you are trying to insert multiple strings into your function_call_t type, which is a fusion sequence that can be converted to from a single std::string. However, you are probably going to run into issues with your function_call rule because it's attribute is actually tuple <std::string, optional <vector <std::string>>>. I'd imagine that spirit is having issues flattening that structure out and that is causing your issue, however, I don't have a compiler to test it out at the moment.

parsing into std::vector<string> with Spirit Qi, getting segfaults or assert failures

I am using Spirit Qi as my parser, to parse mathematical expressions into an expression tree. I keep track of such things as the types of the symbols which are encountered as I parse, and which must be declared in the text I am parsing. Namely, I am parsing Bertini input files, a simple-ish example of which is here, a complicated example is here, and for completeness purposes, as below:
%input: our first input file
variable_group x,y;
function f,g;
f = x^2 - 1;
g = y^2 - 4;
END;
The grammar I have been working on will ideally
find declaration statements, and then parse the following comma-separated list of symbols of the type being declared, and store the resulting vector of symbols in the class object being parsed into; e.g. variable_group x, y;
find a previously declared symbol, which is followed by an equals sign, and is the definition of that symbol as an evaluatable mathematical object; e.g. f = x^2 - 1; This part I mostly have under control.
find a not-previously declared symbol followed by =, and parse it as a subfunction. I think I can handle this, too.
The problem I have been struggling to solve seems like it is so trivial, yet after hours of searching, I still haven't gotten there. I've read dozens of Boost Spirit mailing list posts, SO posts, the manual, and the headers for Spirit themselves, yet still don't quite grok a few critical things about Spirit Qi parsing.
Here is the problematic basic grammar definition, which would go in system_parser.hpp:
#define BOOST_SPIRIT_USE_PHOENIX_V3 1
#include <boost/spirit/include/qi_core.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <iostream>
#include <string>
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
template<typename Iterator>
struct SystemParser : qi::grammar<Iterator, std::vector<std::string>(), boost::spirit::ascii::space_type>
{
SystemParser() : SystemParser::base_type(variable_group_)
{
namespace phx = boost::phoenix;
using qi::_1;
using qi::_val;
using qi::eps;
using qi::lit;
qi::symbols<char,int> encountered_variables;
qi::symbols<char,int> declarative_symbols;
declarative_symbols.add("variable_group",0);
// wraps the vector between its appropriate declaration and line termination.
BOOST_SPIRIT_DEBUG_NODE(variable_group_);
debug(variable_group_);
variable_group_.name("variable_group_");
variable_group_ %= lit("variable_group") >> genericvargp_ >> lit(';');
// creates a vector of strings
BOOST_SPIRIT_DEBUG_NODE(genericvargp_);
debug(genericvargp_);
genericvargp_.name("genericvargp_");
genericvargp_ %= new_variable_ % ',';
// will in the future make a shared pointer to an object using the string
BOOST_SPIRIT_DEBUG_NODE(new_variable_);
debug(new_variable_);
new_variable_.name("new_variable_");
new_variable_ %= unencountered_symbol_;
// this rule gets a string.
BOOST_SPIRIT_DEBUG_NODE(unencountered_symbol_);
debug(unencountered_symbol_);
unencountered_symbol_.name("unencountered_symbol");
unencountered_symbol_ %= valid_variable_name_ - ( encountered_variables | declarative_symbols);
// get a string which fits the naming rules.
BOOST_SPIRIT_DEBUG_NODE(valid_variable_name_);
valid_variable_name_.name("valid_variable_name_");
valid_variable_name_ %= +qi::alpha >> *(qi::alnum | qi::char_('_') | qi::char_('[') | qi::char_(']') );
}
// rule declarations. these are member variables for the parser.
qi::rule<Iterator, std::vector<std::string>(), ascii::space_type > variable_group_;
qi::rule<Iterator, std::vector<std::string>(), ascii::space_type > genericvargp_;
qi::rule<Iterator, std::string(), ascii::space_type> new_variable_;
qi::rule<Iterator, std::string(), ascii::space_type > unencountered_symbol_;// , ascii::space_type
// the rule which determines valid variable names
qi::rule<Iterator, std::string()> valid_variable_name_;
};
and some code which uses it:
#include "system_parsing.hpp"
int main(int argc, char** argv)
{
std::vector<std::string> V;
std::string str = "variable_group x, y, z;";
std::string::const_iterator iter = str.begin();
std::string::const_iterator end = str.end();
SystemParser<std::string::const_iterator> S;
bool s = phrase_parse(iter, end, S, boost::spirit::ascii::space, V);
std::cout << "the unparsed string:\n" << std::string(iter,end);
return 0;
}
It compiles under Clang 4.9.x on OSX just fine. When I run it, I get:
Assertion failed: (px != 0), function operator->, file /usr/local/include/boost/smart_ptr/shared_ptr.hpp, line 648.
Alternately, if I use expectation operator > rather than >> in the definition of the variable_group_ rule, I get our dear old friend Segmentation fault: 11.
In my learning process, I've come across such excellent posts as how to tell the type spirit is trying to generate, attribute propagation, how to interact with symbols, an example of infinite left recursion which lead to a segfault, information on parsing into classes, not structs which has a link to using Customization points (yet the links contain no examples), the Nabialek trick which couples keywords to actions, and perhaps most relevant for what I am trying to do dynamic difference parsing which is certainly something I need since the set of symbols grows, and I disallow usage of them as another type later, as the set of already-encountered symbols starts empty, and grows -- that it, the rules for parsing are dynamic.
So here's where I am at. My current problem is the assert/segfault generated by this particular example. However, I am unclear on some things, and need guiding advice, which I just haven't put together from any of the sources I have consulted, and the request for which hopefully makes this SO question disjoint from others previously asked:
When is it appropriate to use lexeme? I just don't know when to use lexeme, and not.
What are some guidelines for when to use > rather than >>?
I've seen many Fusion adapt examples where there is a struct to be parsed into, and a set of rules to do so. My input files will possibly have multiple occurrences of declarations of function, variables, etc, which all need to go the same place, so I need to be able to add to fields of the terminal class object into which I am parsing, in any order, multiple times. I think I would like to use getter/setters for the class object, so that parsing is not the only pathway to object construction. Is this a problem?
Any kind advice for this beginner is most welcome.
You reference the symbols variables. But they are locals so they don't exist once the constructor returns. This invokes Undefined Behaviour. Anything can happen.
Make the symmbol tables members of the class.
Also simplifying the dance around
the skippers (see Boost spirit skipper issues). That link also answers your _"When is it appropriate to use lexeme[]. In your sample you lacked the lexeme[] around encountered_variables|declarative_symbols, for example.
the debug macros
the operator%=, and some generally unused stuff
guessing you didn't need the mapped type of the symbols<> (because the int wasn't consumed), simplified the initialization there
Demo
Live On Coliru
#define BOOST_SPIRIT_USE_PHOENIX_V3 1
#define BOOST_SPIRIT_DEBUG 1
#include <boost/spirit/include/qi_core.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <iostream>
#include <string>
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
template <typename Iterator, typename Skipper = ascii::space_type>
struct SystemParser : qi::grammar<Iterator, std::vector<std::string>(), Skipper> {
SystemParser() : SystemParser::base_type(variable_group_)
{
declarative_symbols += "variable_group";
variable_group_ = "variable_group" >> genericvargp_ >> ';';
genericvargp_ = new_variable_ % ',';
valid_variable_name_ = qi::alpha >> *(qi::alnum | qi::char_("_[]"));
unencountered_symbol_ = valid_variable_name_ - (encountered_variables|declarative_symbols);
new_variable_ = unencountered_symbol_;
BOOST_SPIRIT_DEBUG_NODES((variable_group_) (valid_variable_name_) (unencountered_symbol_) (new_variable_) (genericvargp_))
}
private:
qi::symbols<char, qi::unused_type> encountered_variables, declarative_symbols;
// rule declarations. these are member variables for the parser.
qi::rule<Iterator, std::vector<std::string>(), Skipper> variable_group_;
qi::rule<Iterator, std::vector<std::string>(), Skipper> genericvargp_;
qi::rule<Iterator, std::string()> new_variable_;
qi::rule<Iterator, std::string()> unencountered_symbol_; // , Skipper
// the rule which determines valid variable names
qi::rule<Iterator, std::string()> valid_variable_name_;
};
//#include "system_parsing.hpp"
int main() {
using It = std::string::const_iterator;
std::string const str = "variable_group x, y, z;";
SystemParser<It> S;
It iter = str.begin(), end = str.end();
std::vector<std::string> V;
bool s = phrase_parse(iter, end, S, boost::spirit::ascii::space, V);
if (s)
{
std::cout << "Parse succeeded: " << V.size() << "\n";
for (auto& s : V)
std::cout << " - '" << s << "'\n";
}
else
std::cout << "Parse failed\n";
if (iter!=end)
std::cout << "Remaining unparsed: '" << std::string(iter, end) << "'\n";
}
Prints
Parse succeeded: 3
- 'x'
- 'y'
- 'z'

Boost Spirit optional parser and backtracking

Why this parser leave 'b' in attributes, even if option wasn't matched?
using namespace boost::spirit::qi;
std::string str = "abc";
auto a = char_("a");
auto b = char_("b");
qi::rule<std::string::iterator, std::string()> expr;
expr = +a >> -(b >> +a);
std::string res;
bool r = qi::parse(
str.begin(),
str.end(),
expr >> lit("bc"),
res
);
It parses successfully, but res is "ab".
If parse "abac" with expr alone, option is matched and attribute is "aba".
Same with "aac", option doesn't start to match and attribute is "aa".
But with "ab", attribute is "ab", even though b gets backtracked, and, as in example, matched with next parser.
UPD
With expr.name("expr"); and debug(expr); I got
<expr>
<try>abc</try>
<success>bc</success>
<attributes>[[a, b]]</attributes>
</expr>
Firstly, it's UB to use the auto variables to keep the expression templates, because they hold references to the temporaries "a" and "b" [1].
Instead write
expr = +qi::char_("a") >> -(qi::char_("b") >> +qi::char_("a"));
or, if you insist:
auto a = boost::proto::deep_copy(qi::char_("a"));
auto b = boost::proto::deep_copy(qi::char_("b"));
expr = +a >> -(b >> +a);
Now noticing the >> lit("bc") part hiding in the parse call, suggests you may expect backtracking to on succesfully matched tokens when a parse failure happens down the road.
That doesn't happen: Spirit generates PEG grammars, and always greedily matches from left to right.
On to the sample given, ab results, even though backtracking does occur, the effects on the attribute are not rolled back without qi::hold: Live On Coliru
Container attributes are passed along by ref and the effects of previous (successful) expressions is not rolled back, unless you tell Spirit too. This way, you can "pay for what you use" (as copying temporaries all the time would be costly).
See e.g.
boost::spirit::qi duplicate parsing on the output
Understanding Boost.spirit's string parser
Boost spirit revert parsing
<a>
<try>abc</try>
<success>bc</success>
<attributes>[a]</attributes>
</a>
<a>
<try>bc</try>
<fail/>
</a>
<b>
<try>bc</try>
<success>c</success>
<attributes>[b]</attributes>
</b>
<a>
<try>c</try>
<fail/>
</a>
<bc>
<try>bc</try>
<success></success>
<attributes>[]</attributes>
</bc>
Success: 'ab'
[1] see here:
Assigning parsers to auto variables
Generating Spirit parser expressions from a variadic list of alternative parser expressions
boost spirit V2 qi bug associated with optimization level
Quoting #sehe from this SO question
A string attribute is a container attribute and many elements could be
assigned into it by different parser subexpressions. Now for
efficiency reasons, Spirit doesn't rollback the values of emitted
attributes on backtracking.
So, I've put optional parser on hold, and it's done.
expr = +qi::char_("a") >> -(qi::hold[qi::char_("b") >> +qi::char_("a")]);
For more information see mentioned question and hold docs

Parsing nested data in boost-spirit

I need parse some text-tree :
std::string data = "<delimiter>field1a fieald1b fieald1c<delimiter1>subfield11<delimiter1>subfieald12<delimiter1>subfieald13 ... <delimiter>field2a fieald2b fieald2c<delimiter1>subfield21<delimiter1>subfieald22<delimiter1>subfieald23 ..."
where <delimiter>,<delimiter1> is part of std::string not a single char
It is possible tokenize this string with boost::spirit?
The list parser is you friend:
namespace qi = boost::spirit::qi;
// tokenize on '<delimiter1>' and return the vector
rule<std::string::iterator, qi::space_type, std::vector<std::string>()> fields =
*(char_ - "<delimiter1>") % "<delimiter1>";
std::string data("<delimiter>field1a fieald1b ...");
std::vector<std::vector<std::string> > fields_data;
// tokenize of '<delimiter>' and return a vector of vectors
qi::phrase_parse(data.begin(), data.end(),
fields % "<delimiter>", qi::space, fields_data);
You might need a recent version of Spirit for this to work (Boost V1.47 or SVN trunk).
Yes you could use spirit to do this format but it seems to me to be much more than you need.
I would just code the tokenise myself directly using std string functions. Alternately boost:regex should do this very easily for you.

Parsing a pair of ints with boost spirit

I have the following code:
std::string test("1.1");
std::pair<int, int> d;
bool r = qi::phrase_parse(
test.begin(),
test.end(),
qi::int_ >> '.' >> qi::int_,
space,
d
);
So I'm trying to parse the string test and place the result in the std::pair d. However it is not working, I suspect it has to do with the Compound Attribute Rules.
Any hints to how to get this working?
The compiler error is the following:
error: no matching function for call
to 'std::pair::pair(const
int&)'
It should work. What people forget very often is to add a
#include <boost/fusion/include/std_pair.hpp>
to their list of includes. This is necessary to make std::pair a full blown Fusion citizen.