Spirit X3, two rules do not compile after being combined into one - c++

I am currently learning how to use x3. As the title states, I have had success creating a grammar with a few simple rules, but upon combining two of these rules into one, the code no longer compiles. Here is the code for the AST portion:
namespace x3 = boost::spirit::x3;
struct Expression;
struct FunctionExpression {
std::string functionName;
std::vector<x3::forward_ast<Expression>> inputs;
};
struct Expression: x3::variant<int, double, bool, FunctionExpression> {
using base_type::base_type;
using base_type::operator=;
};
The rules I have created parse input formatted as {rangeMin, rangeMax}:
rule<struct basic_exp_class, ast::Expression> const
basic_exp = "basic_exp";
rule<struct exp_pair_class, std::vector<ast::Expression>> const
exp_pair = "exp_pair";
rule<struct range_class, ast::FunctionExpression> const
range = "range";
auto const basic_exp_def = double_ | int_ | bool_;
auto const exp_pair_def = basic_expr >> ',' >> basic_expr;
auto const range_def = attr("computeRange") >> '{' >> exp_pair >> '}';
BOOST_SPIRIT_DEFINE(basic_expr, exp_pair_def, range_def);
This code compiles fine. However, if I try to inline the exp_pair rule into the range_def rule, like so:
rule<struct basic_exp_class, ast::Expression> const
basic_exp = "basic_exp";
rule<struct range_class, ast::FunctionExpression> const
range = "range";
auto const basic_exp_def = double_ | int_ | bool_;
auto const range_def = attr("computeRange") >> '{' >> (
basic_exp >> ',' >> basic_exp
) >> '}';
BOOST_SPIRIT_DEFINE(basic_expr, range_def);
The code fails to compile with a very long template error, ending with the line:
spirit/include/boost/spirit/home/x3/operator/detail/sequence.hpp:149:9: error: static assertion failed: Size of the passed attribute is less than expected.
static_assert(
^~~~~~~~~~~~~
The header file also includes this comment above the static_assert:
// If you got an error here, then you are trying to pass
// a fusion sequence with the wrong number of elements
// as that expected by the (sequence) parser.
But I do not see why the code should fail. According to x3's compound attribute rules, the inlined portion in the parenthesis should have an attribute of type vector<ast::Expression>, making the overall rule have the type tuple<string, vector<ast::Expression>, so that it would be compatible with ast::FunctionExpression. The same logic applies the more verbose three-rule version, the only difference being that I specifically declared a rule for the inner part and specifically stated its attribute needed to be of type vector<ast::Expression>.

Spirit x3 is probably seeing the result of the inlined rule as two separate ast::Expression instead of the std::vector<ast::Expression> required by the ast::FunctionExpression struct.
To solve it we can use a helper as lambda as mentioned in another answer to specify the return type of a sub-rule.
And the modified range_def would become:
auto const range_def = attr("computeRange") >> '{' >> as<std::vector<ast::Expression>>(basic_exp >> ',' >> basic_exp) >> '}';

Related

boost variant type collision

Follow-Up Question
So, I've been playing with the
Boost Mini C Tutorial
What I have done is added a rule to parse string literals. The purpose is so that I can parse and compile programs like (functionality already built-in):
int ret(int x) {
return x;
}
int main() {
int x = 5;
return ret(x)*2;
}
As well as (want to add this functionality),
string print(string s) {
return s;
}
int main() {
string foo = "bar";
print(foo);
return 0;
}
Whether or not the last two examples compile with say gcc, is inconsequential.
So, the gist of what I added is the following:
Within the file expression_def.hpp (production rule 'quoted_string' has been added):
quoted_string = '"' >> *('\\' >> char_ | ~char_('"')) >> '"'; // ADDED THIS
primary_expr =
uint_
| quoted_string // ADDED THIS
| function_call
| identifier
| bool_
| '(' > expr > ')'
;
within ast.hpp, the variant type 'std:string' has been added:
typedef boost::variant<
nil
, bool
, unsigned int
, std::string // ADDED THIS
, identifier
, boost::recursive_wrapper<unary>
, boost::recursive_wrapper<function_call>
, boost::recursive_wrapper<expression>
>
operand;
Here is the rule declaration for the addition, as well as the rule it's colliding with:
qi::rule<Iterator, std::string(), skipper<Iterator> > identifier;
qi::rule<Iterator, std::string()> quoted_string; // declaring this without the skipper
// lets us avoid the lexeme[] incantation (thanks #sehe).
The problem now, is that the compiler confuses what should be an 'identifier' for a 'quoted_string' - or actually just a std::string.
My guess is, the fact that they both have a std::string signature return type is the cause of the problem, but I don't know a good workaround here. Additionally, the 'identifier' struct has a data member of type std::string that it is initialized with, so really the compiler cannot tell between the two and the variant std::string ends up being the better match.
Now, if I change std::string to char* like so:
typedef boost::variant<
nil
, bool
, unsigned int
, char* // CHANGED, YET AGAIN
, identifier
, boost::recursive_wrapper<unary>
, boost::recursive_wrapper<function_call>
, boost::recursive_wrapper<expression>
>
operand;
it will compile and work with integers, bet then I am unable to parse strings (in fact, VS will call abort()) It should be noted that because each variant needs an overload, I have something in my code along the lines of:
bool compiler::operator()(std::string const& x)
{
BOOST_ASSERT(current != 0);
current->op(op_string, x);
return true;
}
and
void function::op(int a, std::string const& b)
{
code.push_back(a);
code.push_back(b.size());
for (uintptr_t ch : b)
{
code.push_back(ch);
}
size_ += 2 + b.size();
}
These both work swimmingly when I need to parse strings (of course sacrificing the ability to handle integers).
Their integer equivalents are (and found in compiler.cpp)
bool compiler::operator()(unsigned int x)
{
BOOST_ASSERT(current != 0);
current->op(op_int, x);
return true;
}
and of course:
void function::op(int a, int b)
{
code.push_back(a);
code.push_back(b);
size_ += 2;
}
If I have to change the variant type from std::string to char*, then I have to update the overloads, and because of C legacies, it gets to look a bit ugly.
I understand this might be a bit daunting and not really appealing to comb through the source, but I assure you it really isn't. This compiler tutorial simply pushes bytecode into a vector, which by design only handles integers. I am trying to modify it to handle strings, as well, hence the additions and overloads, as well as the need for unintptr_t. Anyone familiar with the material and/or Boost will likely know exactly what they are looking at (ehem, #sehe, ehem!).

How do I convert boost::spirit::qi::lexeme's attribute to std::string?

Consider:
struct s {
AttrType f(const std::string &);
};
...and a rule r with an attribute AttrType:
template <typename Signature> using rule_t =
boost::spirit::qi::rule<Iterator,
Signature,
boost::spirit::qi::standard::space_type>;
rule_t<AttrType()> r;
r = lexeme[alnum >> +(alnum | char_('.') | char_('_'))][
_val = boost::phoenix::bind(&s::f, s_inst, _1)
];
When compiling this (with clang), I get this error message:
boost/phoenix/bind/detail/preprocessed/member_function_ptr_10.hpp:28:72: error: no viable conversion from
'boost::fusion::vector2<char, std::__1::vector<char, std::__1::allocator<char> > >' to 'const std::__1::basic_string<char>'
return (BOOST_PROTO_GET_POINTER(class_type, obj)->*fp)(a0);
^~
It's my impression that the problem is the type of the placeholder variable, _1. Is there a concise way to convert lexeme's attribute to std::string for this purpose?
If I interject an additional rule with an attribute type of std::string, it compiles:
rule_t<std::string()> r_str;
r = r_str[boost::phoenix::bind(&s::f, s_inst, _1)];
r_str = lexeme[alnum >> +(alnum | char_('.') | char_('_'))];
...but this seems a bit awkward. Is there a better way?
You can use qi::as_string[] (which will coerce the attribute into a string if a suitable automatic transformation exists).
Alternatively you can use qi::raw[] which exposes the source-iterator range. This will automatically transform into std::string attributes. The good thing here is that the input can be reflected unaltered (e.g. qi::raw[ qi::int_ >> ';' >> qi::double_ ] will work
In your case you can probably use as_string[]. But you can also fix the argument to take a std::vector<char> const&
Finally you could use attr_cast<> to achieve exactly the same effect as with the separate qi::rule<> (but without using the separate rule :)) but I don't recommend it for efficiency and because older versions of boost had bugs in this facility.

undefined behaviour somewhere in boost::spirit::qi::phrase_parse

I am learning to use boost::spirit library. I took this example http://www.boost.org/doc/libs/1_56_0/libs/spirit/example/qi/num_list1.cpp and compiled it on my computer - it works fine.
However if I modify it a little - if I initialize the parser itself
auto parser = qi::double_ >> *(',' >> qi::double_);
somewhere as global variable and pass it to phrase_parse, everything goes crazy. Here is the complete modified code (only 1 line is modified and 1 added) - http://pastebin.com/5rWS3pMt
If I run the original code and pass "3.14, 3.15" to stdin, it says Parsing succeeded, but with my modified version it fails. I tried a lot of modifications of the same type - assigning the parser to global variable - in some variants on some compilers it segfaults.
I don't understand why and how it is so.
Here is another, simpler version which prints true and then segfaults on clang++ and just segfaults on g++
#include <boost/spirit/include/qi.hpp>
#include <iostream>
#include <string>
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
const auto doubles_parser_global = qi::double_ >> *(',' >> qi::double_);
int main() {
const auto doubles_parser_local = qi::double_ >> *(',' >> qi::double_);
const std::string nums {"3.14, 3.15, 3.1415926"};
std::cout << std::boolalpha;
std::cout
<< qi::phrase_parse(
nums.cbegin(), nums.cend(), doubles_parser_local, ascii::space
)
<< std::endl; // works fine
std::cout
<< qi::phrase_parse(
nums.cbegin(), nums.cend(), doubles_parser_global, ascii::space
) // this segfaults
<< std::endl;
}
You cannot use auto to store parser expressions¹
Either you need to evaluate from the temporary expression directly, or you need to assign to a rule/grammar:
const qi::rule<std::string::const_iterator, qi::space_type> doubles_parser_local = qi::double_ >> *(',' >> qi::double_);
You can have your cake and eat it too on most recent BOost versions (possibly the dev branch) there should be a BOOST_SPIRIT_AUTO macro
This is becoming a bit of a FAQ item:
Assigning parsers to auto variables
boost spirit V2 qi bug associated with optimization level
¹ I believe this is actually a limitation of the underlying Proto library. There's a Proto-0x lib version on github (by Eric Niebler) that promises to solve these issues by being completely redesigned to be aware of references. I think this required some c++11 features that Boost Proto currently cannot use.

boost::spirit::karma: using no_delimit with alternatives

I'm trying to turn off delimiting around a rule that includes the alternatives operator ('|'), but I'm getting a compile error about incompatible delimiters. As an example, I took the calc2_ast_dump.cpp example from boost, and modified the ast_node rule in struct dump_ast to be:
ast_node %= no_delimit[int_ | binary_node | unary_node];
but this gives the compile error:
/usr/include/boost/function/function_template.hpp:754:17: note: candidate function not viable: no known conversion
from 'const
boost::spirit::karma::detail::unused_delimiter<boost::spirit::karma::any_space<boost::spirit::char_encoding::ascii>
>' to 'const boost::spirit::karma::any_space<boost::spirit::char_encoding::ascii>' for 3rd argument
result_type operator()(BOOST_FUNCTION_PARMS) const
and a relevant comment in boost/spirit/home/karma/nonterminal/rule.hpp:
// If you are seeing a compilation error here stating that the
// third parameter can't be converted to a karma::reference
// then you are probably trying to use a rule or a grammar with
// an incompatible delimiter type.
in my own project, I'm able to do "no_delimit[a << b]" without issue (using karma::space delimiter).
Is there something I'm missing about alternatives? why would no_delimit work with '<<', but not '|'?
I'm using boost 1.48, so was there a bugfix I need to pick up?
You need to modify the rule declarations to reflect the fact that they aren't using a delimiter.
Assuming that you didn't want any delimiting, whatsoever:
template <typename OuputIterator>
struct dump_ast
: karma::grammar<OuputIterator, expression_ast()>
{
dump_ast() : dump_ast::base_type(ast_node)
{
ast_node %= int_ | binary_node | unary_node;
binary_node %= '(' << ast_node << char_ << ast_node << ')';
unary_node %= '(' << char_ << ast_node << ')';
}
karma::rule<OuputIterator, expression_ast()> ast_node;
karma::rule<OuputIterator, binary_op()> binary_node;
karma::rule<OuputIterator, unary_op()> unary_node;
};
See it live on http://liveworkspace.org/code/4edZlj$0

boost spirit semantic action parameters

in this article about boost spirit semantic actions it is mentioned that
There are actually 2 more arguments
being passed: the parser context and a
reference to a boolean ‘hit’
parameter. The parser context is
meaningful only if the semantic action
is attached somewhere to the right
hand side of a rule. We will see more
information about this shortly. The
boolean value can be set to false
inside the semantic action invalidates
the match in retrospective, making the
parser fail.
All fine, but i've been trying to find an example passing a function object as semantic action that uses the other parameters (parser context and hit boolean) but i haven't found any. I would love to see an example using regular functions or function objects, as i barely can grok the phoenix voodoo
This a really good question (and also a can of worms) because it gets at the interface of qi and phoenix. I haven't seen an example either, so I'll extend the article a little in this direction.
As you say, functions for semantic actions can take up to three parameters
Matched attribute - covered in the article
Context - contains the qi-phoenix interface
Match flag - manipulate the match state
Match flag
As the article states, the second parameter is not meaningful unless the expression is part of a rule, so lets start with the third. A placeholder for the second parameter is still needed though and for this use boost::fusion::unused_type. So a modified function from the article to use the third parameter is:
#include <boost/spirit/include/qi.hpp>
#include <string>
#include <iostream>
void f(int attribute, const boost::fusion::unused_type& it, bool& mFlag){
//output parameters
std::cout << "matched integer: '" << attribute << "'" << std::endl
<< "match flag: " << mFlag << std::endl;
//fiddle with match flag
mFlag = false;
}
namespace qi = boost::spirit::qi;
int main(void){
std::string input("1234 6543");
std::string::const_iterator begin = input.begin(), end = input.end();
bool returnVal = qi::phrase_parse(begin, end, qi::int_[f], qi::space);
std::cout << "return: " << returnVal << std::endl;
return 0;
}
which outputs:
matched integer: '1234'
match flag: 1
return: 0
All this example does is switch the match to a non-match, which is reflected in the parser output. According to hkaiser, in boost 1.44 and up setting the match flag to false will cause the match to fail in the normal way. If alternatives are defined, the parser will backtrack and attempt to match them as one would expect. However, in boost<=1.43 a Spirit bug prevents backtracking, which causes strange behavior. To see this, add phoenix include boost/spirit/include/phoenix.hpp and change the expression to
qi::int_[f] | qi::digit[std::cout << qi::_1 << "\n"]
You'd expect that, when the qi::int parser fails, the alternative qi::digit to match the beginning of the input at "1", but the output is:
matched integer: '1234'
match flag: 1
6
return: 1
The 6 is the first digit of the second int in the input which indicates the alternative is taken using the skipper and without backtracking. Notice also that the match is considered succesful, based on the alternative.
Once boost 1.44 is out, the match flag will be useful for applying match criteria that might be otherwise difficult to express in a parser sequence. Note that the match flag can be manipulated in phoenix expressions using the _pass placeholder.
Context parameter
The more interesting parameter is the second one, which contains the qi-phoenix interface, or in qi parlance, the context of the semantic action. To illustrate this, first examine a rule:
rule<Iterator, Attribute(Arg1,Arg2,...), qi::locals<Loc1,Loc2,...>, Skipper>
The context parameter embodies the Attribute, Arg1, ... ArgN, and qi::locals template paramters, wrapped in a boost::spirit::context template type. This attribute differs from the function parameter: the function parameter attribute is the parsed value, while this attribute is the value of the rule itself. A semantic action must map the former to the latter. Here's an example of a possible context type (phoenix expression equivalents indicated):
using namespace boost;
spirit::context< //context template
fusion::cons<
int&, //return int attribute (phoenix: _val)
fusion::cons<
char&, //char argument1 (phoenix: _r1)
fusion::cons<
float&, //float argument2 (phoenix: _r2)
fusion::nil //end of cons list
>,
>,
>,
fusion::vector2< //locals container
char, //char local (phoenix: _a)
unsigned int //unsigned int local (phoenix: _b)
>
>
Note the return attribute and argument list take the form of a lisp-style list (a cons list). To access these variables within a function, access the attribute or locals members of the context struct template with fusion::at<>(). For example, for a context variable con
//assign return attribute
fusion::at_c<0>(con.attributes) = 1;
//get the second rule argument
float arg2 = fusion::at_c<2>(con.attributes);
//assign the first local
fusion::at_c<1>(con.locals) = 42;
To modify the article example to use the second argument, change the function definition and phrase_parse calls:
...
typedef
boost::spirit::context<
boost::fusion::cons<int&, boost::fusion::nil>,
boost::fusion::vector0<>
> f_context;
void f(int attribute, const f_context& con, bool& mFlag){
std::cout << "matched integer: '" << attribute << "'" << std::endl
<< "match flag: " << mFlag << std::endl;
//assign output attribute from parsed value
boost::fusion::at_c<0>(con.attributes) = attribute;
}
...
int matchedInt;
qi::rule<std::string::const_iterator,int(void),ascii::space_type>
intRule = qi::int_[f];
qi::phrase_parse(begin, end, intRule, ascii::space, matchedInt);
std::cout << "matched: " << matchedInt << std::endl;
....
This is a very simple example that just maps the parsed value to the output attribute value, but extensions should be fairly apparent. Just make the context struct template parameters match the rule output, input, and local types. Note that this type of a direct match between parsed type/value to output type/value can be done automatically using auto rules, with a %= instead of a = when defining the rule:
qi::rule<std::string::const_iterator,int(void),ascii::space_type>
intRule %= qi::int_;
IMHO, writing a function for each action would be rather tedious, compared to the brief and readable phoenix expression equivalents. I sympathize with the voodoo viewpoint, but once you work with phoenix for a little while, the semantics and syntax aren't terribly difficult.
Edit: Accessing rule context w/ Phoenix
The context variable is only defined when the parser is part of a rule. Think of a parser as being any expression that consumes input, where a rule translates the parser values (qi::_1) into a rule value (qi::_val). The difference is often non-trivial, for example when qi::val has a Class type that needs to be constructed from POD parsed values. Below is a simple example.
Let's say part of our input is a sequence of three CSV integers (x1, x2, x3), and we only care out an arithmetic function of these three integers (f = x0 + (x1+x2)*x3 ), where x0 is a value obtained elsewhere. One option is to read in the integers and calculate the function, or alternatively use phoenix to do both.
For this example, use one rule with an output attribute (the function value), and input (x0), and a local (to pass information between individual parsers with the rule). Here's the full example.
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <string>
#include <iostream>
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
int main(void){
std::string input("1234, 6543, 42");
std::string::const_iterator begin = input.begin(), end = input.end();
qi::rule<
std::string::const_iterator,
int(int), //output (_val) and input (_r1)
qi::locals<int>, //local int (_a)
ascii::space_type
>
intRule =
qi::int_[qi::_a = qi::_1] //local = x1
>> ","
>> qi::int_[qi::_a += qi::_1] //local = x1 + x2
>> ","
>> qi::int_
[
qi::_val = qi::_a*qi::_1 + qi::_r1 //output = local*x3 + x0
];
int ruleValue, x0 = 10;
qi::phrase_parse(begin, end, intRule(x0), ascii::space, ruleValue);
std::cout << "rule value: " << ruleValue << std::endl;
return 0;
}
Alternatively, all the ints could be parsed as a vector, and the function evaluated with a single semantic action (the % below is the list operator and elements of the vector are accessed with phoenix::at):
namespace ph = boost::phoenix;
...
qi::rule<
std::string::const_iterator,
int(int),
ascii::space_type
>
intRule =
(qi::int_ % ",")
[
qi::_val = (ph::at(qi::_1,0) + ph::at(qi::_1,1))
* ph::at(qi::_1,2) + qi::_r1
];
....
For the above, if the input is incorrect (two ints instead of three), bad thing could happen at run time, so it would be better to specify the number of parsed values explicitly, so parsing will fail for a bad input. The below uses _1, _2, and _3 to reference the first, second, and third match value:
(qi::int_ >> "," >> qi::int_ >> "," >> qi::int_)
[
qi::_val = (qi::_1 + qi::_2) * qi::_3 + qi::_r1
];
This is a contrived example, but should give you the idea. I've found phoenix semantic actions really helpful in constructing complex objects directly from input; this is possible because you can call constructors and member functions within semantic actions.