Boost spirit: Parse char_ with changing local variable value - c++

I want to implement a grammar that requires parsing instance names and paths, where a path is a list of instance names separated by a divider. The divider can be either . (period) or / (slash) given in the input file before the paths are listed, e.g.:
DIVIDER .
a.b.c
x.y.z
Once set, the divider never changes for the whole file (i.e. if set to ., encountering a path like a/b/c should not parse correctly). Since I don't know what the divider is in advance, I'm thinking about storing it in a variable of my grammar and use that value in corresponding char_ parsers (of course, the actual grammar is much more complex, but this is the part where I'm having trouble).
This is somewhat similar to this question: Boost spirit using local variables but not quite what I want, since using the Nabialek trick allows to parse "invalid" paths after the divider is set.
I'm not asking for a complete solution here, but my question is essentially this: Can I parse values into members of my grammar and then use these values for further parsing of remaining input?

I'd use an inherited attribute:
qi::rule<It, std::string(char)> element = *~qi::char_(qi::_r1);
qi::rule<It, std::vector<std::string>(char)> path = element(qi::_r1) % qi::char_(qi::_r1);
// use it like:
std::vector<std::string> data;
bool ok = qi::parse(f, l, path('/'), data);
Alternatively you /can/ indeed bind to a local variable:
char delim = '/';
qi::rule<It, std::string()> element = *~qi::char_(delim);
qi::rule<It, std::vector<std::string>()> path = element % qi::char_(delim);
// use it like:
std::vector<std::string> data;
bool ok = qi::parse(f, l, path, data);
If you need it to be dynamic, use boost::phoenix::ref:
char delim = '/';
qi::rule<It, std::string()> element = *~qi::char_(boost::phoenix::ref(delim));
qi::rule<It, std::vector<std::string>()> path = element % qi::char_(boost::phoenix::ref(delim));
// use it like:
std::vector<std::string> data;
bool ok = qi::parse(f, l, path, data);

Related

Boost Spirit Qi: Compile error on slight rule change

I'm writing a little compiler just for fun and I'm using Boost Spirit Qi to describe my grammar. Now I want to make a minor change in the grammar to prepare some further additions. Unfortunately these changes won't compile and I would like to understand why this is the case.
Here is a snippet from the code I want to change. I hope the provided information is enough to understand the idea. The complete code is a bit large, but if you want to look at it or even test it (Makefile and Travis CI is provided), see https://github.com/Kruecke/BFGenerator/blob/8f66aa5/bf/compiler.cpp#L433.
typedef boost::variant<
function_call_t,
variable_declaration_t,
variable_assignment_t,
// ...
> instruction_t;
struct grammar : qi::grammar<iterator, program_t(), ascii::space_type> {
grammar() : grammar::base_type(program) {
instruction = function_call
| variable_declaration
| variable_assignment
// | ...
;
function_call = function_name >> '(' > -(variable_name % ',') > ')' > ';';
// ...
}
qi::rule<iterator, instruction::instruction_t(), ascii::space_type> instruction;
qi::rule<iterator, instruction::function_call_t(), ascii::space_type> function_call;
// ...
};
So far, everything is just working fine. Now I want to move the parsing of the trailing semicolon (> ';') from the function_call rule to the instruction rule. My code now looks like this:
struct grammar : qi::grammar<iterator, program_t(), ascii::space_type> {
grammar() : grammar::base_type(program) {
instruction = (function_call > ';') // Added trailing semicolon
| variable_declaration
| variable_assignment
// | ...
;
// Removed trailing semicolon here:
function_call = function_name >> '(' > -(variable_name % ',') > ')';
// ...
}
From my understanding the rules haven't really changed because the character parser ';' doesn't yield any attribute and so it shouldn't matter where this parser is positioned. However, this change won't compile:
/usr/include/boost/spirit/home/support/container.hpp:278:13: error: no matching function for call to ‘std::basic_string<char>::insert(std::basic_string<char>::iterator, const bf::instruction::function_call_t&)’
c.insert(c.end(), val);
^
(This error comes from the instruction = ... line.)
Why is this change not compiling? I'm rather looking for an explanation to understand what's going on than a workaround.
Ok, so after looking at this closely, you are trying to insert multiple strings into your function_call_t type, which is a fusion sequence that can be converted to from a single std::string. However, you are probably going to run into issues with your function_call rule because it's attribute is actually tuple <std::string, optional <vector <std::string>>>. I'd imagine that spirit is having issues flattening that structure out and that is causing your issue, however, I don't have a compiler to test it out at the moment.

Boost Qi Composing rules using Functions

I'm trying to define some Boost::spirit::qi parsers for multiple subsets of a language with minimal code duplication. To do this, I created a few basic rule building functions. The original parser works fine, but once I started to use the composing functions, my parsers no longer seem to work.
The general language is of the form:
A B: C
There are subsets of the language where A, B, or C must be specific types, such as A is an int while B and C are floats. Here is the parser I used for that sub language:
using entry = boost::tuple<int, float, float>;
template <typename Iterator>
struct sublang : grammar<Iterator, entry(), ascii::space_type>
{
sublang() : sublang::base_type(start)
{
start = int_ >> float_ >> ':' >> float_;
}
rule<Iterator, entry(), ascii::space_type> start;
};
But since there are many subsets, I tried to create a function to build my parser rules:
template<typename AttrName, typename Value>
auto attribute(AttrName attrName, Value value)
{
return attrName >> ':' >> value;
}
So that I could build parsers for each subset more easily without duplicate information:
// in sublang
start = int_ >> attribute(float_, float_);
This fails however and I'm not sure why. In my clang testing, parsing just fails. In g++, it seems the program crashes.
Here's the full example code: http://coliru.stacked-crooked.com/a/8636f19b2e9bff8d
What is wrong with the current code and what would be the correct approach for this problem? I would like to avoid specifying the grammar of attributes and other elements in each sublanguage parser.
Quite simply: using auto with Spirit (or any EDSL based on Boost Proto and Boost Phoenix) is most likely Undefined Behaviour¹
Now, you can usually fix this using
BOOST_SPIRIT_AUTO
boost::proto::deep_copy
the new facility that's coming in the most recent version of Boost (TODO add link)
In this case,
template<typename AttrName, typename Value>
auto attribute(AttrName attrName, Value value) {
return boost::proto::deep_copy(attrName >> ':' >> value);
}
fixes it: Live On Coliru
Alternatively
you could use qi::lazy[] with inherited attributes.
I do very similar things in the prop_key rule in Reading JSON file with C++ and BOOST.
you could have a look at the Keyword List Operator from the Spirit Repository. It's designed to allow easier construction of grammars like:
no_constraint_person_rule %=
kwd("name")['=' > parse_string ]
/ kwd("age") ['=' > int_]
/ kwd("size") ['=' > double_ > 'm']
;
This you could potentially combine with the Nabialek Trick. I'd search the answers on SO for examples. (One is Grammar balancing issue)
¹ Except for entirely stateless actors (Eric Niebler on this) and expression placeholders. See e.g.
Assigning parsers to auto variables
undefined behaviour somewhere in boost::spirit::qi::phrase_parse
C++ Boost qi recursive rule construction
boost spirit V2 qi bug associated with optimization level
Some examples
Define parsers parameterized with sub-parsers in Boost Spirit
Generating Spirit parser expressions from a variadic list of alternative parser expressions

Spirit Qi: Completely ignoring output of some rules

I'm parsing some input that is vaguely structured like C-ish code. Like this:
Name0
{
Name1
{
//A COMMENT!!
Param0 *= 2
Param2 = "lol"
}
}
Part of that is comments, which I want to totally ignore (and it's not working). I consider two things to be a node, the named scopes (category rule) like Name0 {} and the values (param rule) like Param0 *= 2... then there is comment. I've tried setting things up like this:
typedef boost::variant<boost::recursive_wrapper<Category>, Param> Node;
qi::rule<Iterator, Node(), ascii::space_type> node;
So the node rule puts either a Category or a Param in a variant. Here are the other rules (I've omitted some rules that don't really matter for this):
qi::rule<Iterator> comment; //comment has no return type
qi::rule<Iterator, Category(), ascii::space_type> category;
qi::rule<Iterator, Param(), ascii::space_type> param;
And their actual code:
comment = "//" >> *(char_ - eol);
param %=
tagstring
>> operators
>> value;
category %=
tagstring
>> '{'
>> *node
> '}';
node %= comment | category | param;
comment is setup to use = instead of %=, and it has no return type. However, comments end up creating null Categorys in my output Nodes wherever they show up. I've tried moving comment out of the node rule and into category like this:
category %=
tagstring
>> '{'
>> *(comment | node)
> '}';
And various other things, but those null entries keep popping up. I had to make comment output a string and put std::string in my Node variant just to sorta catch them, but that messes up my ability to stick in commenting in other parts of my rules (unless I actually grab the string in every location).
How can I completely ignore the comment and have it not show up in any output in any way?
edit: You'd think omit would do it, but didn't seem to change anything...
edit 2: Referencing this SO answer, I have a shaky solution in this:
node %= category | param;
category %=
tagstring
>> '{'
>> *comment >> *(node >> *comment)
> '}';
However, I want to try to stick comments into all sorts of places (between tagstring and {, in my unshown root rule between root categorys, etc). Is there a simpler method than this? I was hoping it could be done with a simple >> commentwrapper plugged into wherever I wanted...
Alright, so making your own skipper isn't too bad. And it elegantly solves this commenting problem, just as Mike M said. I define my rules in a struct called Parser that is templated with an Iterator. Had to make some adjustments to use the skipper. First, here is the skipper which is defined in Parser with all my other rules:
typedef qi::rule<Iterator> Skipper;
Skipper skipper;
So skipper is a rule of type Skipper. Here is what my Parser struct looked like originally, where it was using the ascii::space rule of type ascii::space_type as its skipper, which IS NOT the same type as qi::rule<Iterator> that skipper is based on!
struct Parser : qi::grammar<Iterator, std::vector<Category>(), ascii::space_type>
{
qi::rule<Iterator, std::vector<Category>(), ascii::space_type> root;
...
So every instance of ascii::space_type in the rule templates must be replaced with Skipper! That includes other rules besides the root that is shown here, such as param and category from my question. Leaving any remnant of the old ascii::space_type behind gives cryptic compiler errors.
struct Parser : qi::grammar<Iterator, std::vector<Category>(), qi::rule<Iterator>>
{
typedef qi::rule<Iterator> Skipper;
Skipper skipper;
qi::rule<Iterator, std::vector<Category>(), Skipper> root;
...
The original skipper was merely space, mine is now an alternative of space and comment. No old functionality (space skipping) is lost.
skipper = space | comment;
Then the phrase_parse call needs to be adjusted from this old version that used ascii::space:
bool r = phrase_parse(iter, end, parser, ascii::space, result);
to
bool r = phrase_parse(iter, end, parser, parser.skipper, result);
And now comments disappear as easily as white space. Awesome.

Parse elements into vector using boost::spirit, using semicolon or newlines as separators

I'd like to parse a sequence of integers into an std::vector<int>, using boost::spirit. The integers may be separated by a semicolon or a newline.
But this grammar doesn't compile:
typedef std::vector<int> IntVec;
template <typename Iterator, typename Skipper>
struct MyGrammar : qi::grammar<Iterator, IntVec(), Skipper> {
MyGrammar() : MyGrammar::base_type(start) {
start = +(qi::int_
>> (";" | qi::no_skip(qi::eol)));
}
qi::rule<Iterator, IntVec(), Skipper> start;
};
To be clear, I want to parse the following input, for example,
1; 2; 3
4 ; 5
into one vector (1,2,3,4,5). How can I do that and why does my version not compile?
Can I somehow write the separator ("semicolon or newline") as its own rule? What would its return type be? Some kind of null value?
It looks like the skipper is being applied when checking the semicolon, and so the skip characters (including newline) have already been consumed once qi::no_skip[qi::eol] is reached. The following is working for me, with the no_skip token first:
start = qi::int_ % (qi::no_skip[qi::eol] | ';');
I'm using % so that the final integer does not need to be followed by a semicolon or end-of-line.

Parsing nested data in boost-spirit

I need parse some text-tree :
std::string data = "<delimiter>field1a fieald1b fieald1c<delimiter1>subfield11<delimiter1>subfieald12<delimiter1>subfieald13 ... <delimiter>field2a fieald2b fieald2c<delimiter1>subfield21<delimiter1>subfieald22<delimiter1>subfieald23 ..."
where <delimiter>,<delimiter1> is part of std::string not a single char
It is possible tokenize this string with boost::spirit?
The list parser is you friend:
namespace qi = boost::spirit::qi;
// tokenize on '<delimiter1>' and return the vector
rule<std::string::iterator, qi::space_type, std::vector<std::string>()> fields =
*(char_ - "<delimiter1>") % "<delimiter1>";
std::string data("<delimiter>field1a fieald1b ...");
std::vector<std::vector<std::string> > fields_data;
// tokenize of '<delimiter>' and return a vector of vectors
qi::phrase_parse(data.begin(), data.end(),
fields % "<delimiter>", qi::space, fields_data);
You might need a recent version of Spirit for this to work (Boost V1.47 or SVN trunk).
Yes you could use spirit to do this format but it seems to me to be much more than you need.
I would just code the tokenise myself directly using std string functions. Alternately boost:regex should do this very easily for you.