I am currently learning how to use x3. As the title states, I have had success creating a grammar with a few simple rules, but upon combining two of these rules into one, the code no longer compiles. Here is the code for the AST portion:
namespace x3 = boost::spirit::x3;
struct Expression;
struct FunctionExpression {
std::string functionName;
std::vector<x3::forward_ast<Expression>> inputs;
};
struct Expression: x3::variant<int, double, bool, FunctionExpression> {
using base_type::base_type;
using base_type::operator=;
};
The rules I have created parse input formatted as {rangeMin, rangeMax}:
rule<struct basic_exp_class, ast::Expression> const
basic_exp = "basic_exp";
rule<struct exp_pair_class, std::vector<ast::Expression>> const
exp_pair = "exp_pair";
rule<struct range_class, ast::FunctionExpression> const
range = "range";
auto const basic_exp_def = double_ | int_ | bool_;
auto const exp_pair_def = basic_expr >> ',' >> basic_expr;
auto const range_def = attr("computeRange") >> '{' >> exp_pair >> '}';
BOOST_SPIRIT_DEFINE(basic_expr, exp_pair_def, range_def);
This code compiles fine. However, if I try to inline the exp_pair rule into the range_def rule, like so:
rule<struct basic_exp_class, ast::Expression> const
basic_exp = "basic_exp";
rule<struct range_class, ast::FunctionExpression> const
range = "range";
auto const basic_exp_def = double_ | int_ | bool_;
auto const range_def = attr("computeRange") >> '{' >> (
basic_exp >> ',' >> basic_exp
) >> '}';
BOOST_SPIRIT_DEFINE(basic_expr, range_def);
The code fails to compile with a very long template error, ending with the line:
spirit/include/boost/spirit/home/x3/operator/detail/sequence.hpp:149:9: error: static assertion failed: Size of the passed attribute is less than expected.
static_assert(
^~~~~~~~~~~~~
The header file also includes this comment above the static_assert:
// If you got an error here, then you are trying to pass
// a fusion sequence with the wrong number of elements
// as that expected by the (sequence) parser.
But I do not see why the code should fail. According to x3's compound attribute rules, the inlined portion in the parenthesis should have an attribute of type vector<ast::Expression>, making the overall rule have the type tuple<string, vector<ast::Expression>, so that it would be compatible with ast::FunctionExpression. The same logic applies the more verbose three-rule version, the only difference being that I specifically declared a rule for the inner part and specifically stated its attribute needed to be of type vector<ast::Expression>.
Spirit x3 is probably seeing the result of the inlined rule as two separate ast::Expression instead of the std::vector<ast::Expression> required by the ast::FunctionExpression struct.
To solve it we can use a helper as lambda as mentioned in another answer to specify the return type of a sub-rule.
And the modified range_def would become:
auto const range_def = attr("computeRange") >> '{' >> as<std::vector<ast::Expression>>(basic_exp >> ',' >> basic_exp) >> '}';
Consider:
struct s {
AttrType f(const std::string &);
};
...and a rule r with an attribute AttrType:
template <typename Signature> using rule_t =
boost::spirit::qi::rule<Iterator,
Signature,
boost::spirit::qi::standard::space_type>;
rule_t<AttrType()> r;
r = lexeme[alnum >> +(alnum | char_('.') | char_('_'))][
_val = boost::phoenix::bind(&s::f, s_inst, _1)
];
When compiling this (with clang), I get this error message:
boost/phoenix/bind/detail/preprocessed/member_function_ptr_10.hpp:28:72: error: no viable conversion from
'boost::fusion::vector2<char, std::__1::vector<char, std::__1::allocator<char> > >' to 'const std::__1::basic_string<char>'
return (BOOST_PROTO_GET_POINTER(class_type, obj)->*fp)(a0);
^~
It's my impression that the problem is the type of the placeholder variable, _1. Is there a concise way to convert lexeme's attribute to std::string for this purpose?
If I interject an additional rule with an attribute type of std::string, it compiles:
rule_t<std::string()> r_str;
r = r_str[boost::phoenix::bind(&s::f, s_inst, _1)];
r_str = lexeme[alnum >> +(alnum | char_('.') | char_('_'))];
...but this seems a bit awkward. Is there a better way?
You can use qi::as_string[] (which will coerce the attribute into a string if a suitable automatic transformation exists).
Alternatively you can use qi::raw[] which exposes the source-iterator range. This will automatically transform into std::string attributes. The good thing here is that the input can be reflected unaltered (e.g. qi::raw[ qi::int_ >> ';' >> qi::double_ ] will work
In your case you can probably use as_string[]. But you can also fix the argument to take a std::vector<char> const&
Finally you could use attr_cast<> to achieve exactly the same effect as with the separate qi::rule<> (but without using the separate rule :)) but I don't recommend it for efficiency and because older versions of boost had bugs in this facility.
I just started using Boost::xpressive and find it an excellent library... I went through the documentation and tried to use the ! operator (zero or one) but it doesn't compile (VS2008).
I want to match a sip address which may or may not start with "sip:"
#include <iostream>
#include <boost/xpressive/xpressive.hpp>
using namespace boost::xpressive;
using namespace std;
int main()
{
sregex re = !"sip:" >> *(_w | '.') >> '#' >> *(_w | '.');
smatch what;
for(;;)
{
string input;
cin >> input;
if(regex_match(input, what, re))
{
cout << "match!\n";
}
}
return 0;
}`
You just encountered a bug that plagues most of the DSEL.
The issue is that you want a specific operator to be called, the one actually defined in your specific languages. However this operator already exist in C++, and therefore the normal rules of Lookup and Overload resolution apply.
The selection of the right operator is done with ADL (Argument Dependent Lookup), which means that at least one of the objects on which the operator apply should be part of the DSEL itself.
For example, consider this simple code snippet:
namespace dsel
{
class MyObject;
class MyStream;
MyStream operator<<(std::ostream&, MyObject);
}
int main(int, char*[])
{
std::cout << MyObject() << "other things here";
}
Because the expression is evaluated from left to right, the presence of dsel::MyObject is viral, ie the dsel will here be propagated.
Regarding Xpressive, most of the times it works because you use special "markers" that are Xpressive type instances like (_w) or because of the viral effect (for example "#" works because the expression on the left of >> is Xpressive-related).
Were you to use:
sregex re = "sip:" >> *(_w | '.') >> '#' >> *(_w | '.');
^^^^^^ ~~ ^^^^^^^^^^^
Regular Xpressive
It would work, because the right hand-side argument is "contaminated" by Xpressive thanks to the precedence rules of the operators.
However here operator! has one of the highest precedence. At such, its scope is restricted to:
`!"sip:"`
And since "sip:" is of type char const[5], it just invokes the regular operator! which will rightly conclude that the expression to which it applies is true and thus evaluate to the bool value false.
By using as_xpr, you convert the C-string into an Xpressive object, and thus bring in the right operator! from the Xpressive namespace into consideration, and overload resolution kicks in appropriately.
as_xpr helper must be used...
!as_xpr("sip:")
in this article about boost spirit semantic actions it is mentioned that
There are actually 2 more arguments
being passed: the parser context and a
reference to a boolean ‘hit’
parameter. The parser context is
meaningful only if the semantic action
is attached somewhere to the right
hand side of a rule. We will see more
information about this shortly. The
boolean value can be set to false
inside the semantic action invalidates
the match in retrospective, making the
parser fail.
All fine, but i've been trying to find an example passing a function object as semantic action that uses the other parameters (parser context and hit boolean) but i haven't found any. I would love to see an example using regular functions or function objects, as i barely can grok the phoenix voodoo
This a really good question (and also a can of worms) because it gets at the interface of qi and phoenix. I haven't seen an example either, so I'll extend the article a little in this direction.
As you say, functions for semantic actions can take up to three parameters
Matched attribute - covered in the article
Context - contains the qi-phoenix interface
Match flag - manipulate the match state
Match flag
As the article states, the second parameter is not meaningful unless the expression is part of a rule, so lets start with the third. A placeholder for the second parameter is still needed though and for this use boost::fusion::unused_type. So a modified function from the article to use the third parameter is:
#include <boost/spirit/include/qi.hpp>
#include <string>
#include <iostream>
void f(int attribute, const boost::fusion::unused_type& it, bool& mFlag){
//output parameters
std::cout << "matched integer: '" << attribute << "'" << std::endl
<< "match flag: " << mFlag << std::endl;
//fiddle with match flag
mFlag = false;
}
namespace qi = boost::spirit::qi;
int main(void){
std::string input("1234 6543");
std::string::const_iterator begin = input.begin(), end = input.end();
bool returnVal = qi::phrase_parse(begin, end, qi::int_[f], qi::space);
std::cout << "return: " << returnVal << std::endl;
return 0;
}
which outputs:
matched integer: '1234'
match flag: 1
return: 0
All this example does is switch the match to a non-match, which is reflected in the parser output. According to hkaiser, in boost 1.44 and up setting the match flag to false will cause the match to fail in the normal way. If alternatives are defined, the parser will backtrack and attempt to match them as one would expect. However, in boost<=1.43 a Spirit bug prevents backtracking, which causes strange behavior. To see this, add phoenix include boost/spirit/include/phoenix.hpp and change the expression to
qi::int_[f] | qi::digit[std::cout << qi::_1 << "\n"]
You'd expect that, when the qi::int parser fails, the alternative qi::digit to match the beginning of the input at "1", but the output is:
matched integer: '1234'
match flag: 1
6
return: 1
The 6 is the first digit of the second int in the input which indicates the alternative is taken using the skipper and without backtracking. Notice also that the match is considered succesful, based on the alternative.
Once boost 1.44 is out, the match flag will be useful for applying match criteria that might be otherwise difficult to express in a parser sequence. Note that the match flag can be manipulated in phoenix expressions using the _pass placeholder.
Context parameter
The more interesting parameter is the second one, which contains the qi-phoenix interface, or in qi parlance, the context of the semantic action. To illustrate this, first examine a rule:
rule<Iterator, Attribute(Arg1,Arg2,...), qi::locals<Loc1,Loc2,...>, Skipper>
The context parameter embodies the Attribute, Arg1, ... ArgN, and qi::locals template paramters, wrapped in a boost::spirit::context template type. This attribute differs from the function parameter: the function parameter attribute is the parsed value, while this attribute is the value of the rule itself. A semantic action must map the former to the latter. Here's an example of a possible context type (phoenix expression equivalents indicated):
using namespace boost;
spirit::context< //context template
fusion::cons<
int&, //return int attribute (phoenix: _val)
fusion::cons<
char&, //char argument1 (phoenix: _r1)
fusion::cons<
float&, //float argument2 (phoenix: _r2)
fusion::nil //end of cons list
>,
>,
>,
fusion::vector2< //locals container
char, //char local (phoenix: _a)
unsigned int //unsigned int local (phoenix: _b)
>
>
Note the return attribute and argument list take the form of a lisp-style list (a cons list). To access these variables within a function, access the attribute or locals members of the context struct template with fusion::at<>(). For example, for a context variable con
//assign return attribute
fusion::at_c<0>(con.attributes) = 1;
//get the second rule argument
float arg2 = fusion::at_c<2>(con.attributes);
//assign the first local
fusion::at_c<1>(con.locals) = 42;
To modify the article example to use the second argument, change the function definition and phrase_parse calls:
...
typedef
boost::spirit::context<
boost::fusion::cons<int&, boost::fusion::nil>,
boost::fusion::vector0<>
> f_context;
void f(int attribute, const f_context& con, bool& mFlag){
std::cout << "matched integer: '" << attribute << "'" << std::endl
<< "match flag: " << mFlag << std::endl;
//assign output attribute from parsed value
boost::fusion::at_c<0>(con.attributes) = attribute;
}
...
int matchedInt;
qi::rule<std::string::const_iterator,int(void),ascii::space_type>
intRule = qi::int_[f];
qi::phrase_parse(begin, end, intRule, ascii::space, matchedInt);
std::cout << "matched: " << matchedInt << std::endl;
....
This is a very simple example that just maps the parsed value to the output attribute value, but extensions should be fairly apparent. Just make the context struct template parameters match the rule output, input, and local types. Note that this type of a direct match between parsed type/value to output type/value can be done automatically using auto rules, with a %= instead of a = when defining the rule:
qi::rule<std::string::const_iterator,int(void),ascii::space_type>
intRule %= qi::int_;
IMHO, writing a function for each action would be rather tedious, compared to the brief and readable phoenix expression equivalents. I sympathize with the voodoo viewpoint, but once you work with phoenix for a little while, the semantics and syntax aren't terribly difficult.
Edit: Accessing rule context w/ Phoenix
The context variable is only defined when the parser is part of a rule. Think of a parser as being any expression that consumes input, where a rule translates the parser values (qi::_1) into a rule value (qi::_val). The difference is often non-trivial, for example when qi::val has a Class type that needs to be constructed from POD parsed values. Below is a simple example.
Let's say part of our input is a sequence of three CSV integers (x1, x2, x3), and we only care out an arithmetic function of these three integers (f = x0 + (x1+x2)*x3 ), where x0 is a value obtained elsewhere. One option is to read in the integers and calculate the function, or alternatively use phoenix to do both.
For this example, use one rule with an output attribute (the function value), and input (x0), and a local (to pass information between individual parsers with the rule). Here's the full example.
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <string>
#include <iostream>
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
int main(void){
std::string input("1234, 6543, 42");
std::string::const_iterator begin = input.begin(), end = input.end();
qi::rule<
std::string::const_iterator,
int(int), //output (_val) and input (_r1)
qi::locals<int>, //local int (_a)
ascii::space_type
>
intRule =
qi::int_[qi::_a = qi::_1] //local = x1
>> ","
>> qi::int_[qi::_a += qi::_1] //local = x1 + x2
>> ","
>> qi::int_
[
qi::_val = qi::_a*qi::_1 + qi::_r1 //output = local*x3 + x0
];
int ruleValue, x0 = 10;
qi::phrase_parse(begin, end, intRule(x0), ascii::space, ruleValue);
std::cout << "rule value: " << ruleValue << std::endl;
return 0;
}
Alternatively, all the ints could be parsed as a vector, and the function evaluated with a single semantic action (the % below is the list operator and elements of the vector are accessed with phoenix::at):
namespace ph = boost::phoenix;
...
qi::rule<
std::string::const_iterator,
int(int),
ascii::space_type
>
intRule =
(qi::int_ % ",")
[
qi::_val = (ph::at(qi::_1,0) + ph::at(qi::_1,1))
* ph::at(qi::_1,2) + qi::_r1
];
....
For the above, if the input is incorrect (two ints instead of three), bad thing could happen at run time, so it would be better to specify the number of parsed values explicitly, so parsing will fail for a bad input. The below uses _1, _2, and _3 to reference the first, second, and third match value:
(qi::int_ >> "," >> qi::int_ >> "," >> qi::int_)
[
qi::_val = (qi::_1 + qi::_2) * qi::_3 + qi::_r1
];
This is a contrived example, but should give you the idea. I've found phoenix semantic actions really helpful in constructing complex objects directly from input; this is possible because you can call constructors and member functions within semantic actions.
I'd like to parse simple C++ typedef instructions such as
typedef Class NewNameForClass;
typedef Class::InsideTypedef NewNameForTypedef;
typedef TemplateClass<Arg1,Arg2> AliasForObject;
I have written the corresponding grammar that i'd like to see used in parsing.
Name <- ('_'|letter)('_'|letter|digit)*
Type <- Name
Type <- Type::Name
Type <- Name Templates
Templates <- '<' Type (',' Type)* '>'
Instruction <- "typedef" Type Name ';'
Once this is parsed, all i'll want to do is to generate xml with the same information (but layed out differently)
What is the most effective language for writing such a program ?
How can you achieve this ?
EDIT : What i have come up with using Boost Spirit (it's not perfect, but it's good enough for me, at least for now)
rule<> sep_p = space_p;
rule<> name_p = (ch_p('_')|alpha_p) >> *(ch_p('_')|alpha_p|digit_p);
rule<> type_p = name_p
>> !(*sep_p >>str_p("::") >> *sep_p>> name_p)
>> *(*sep_p >> ch_p('*') )
>> !(*sep_p >> str_p("const"))
>> !(*sep_p >> ch_p('&'));
rule<> templated_type_p = name_p >> *sep_p
>> ch_p('<') >> *sep_p
>> (*sep_p>>type_p>>*sep_p)%ch_p(',')
>> ch_p('>') >> *sep_p;
rule<> typedef_p = *sep_p
>> str_p ("typedef")
>> +sep_p >> (type_p|templated_type_p)
>> +sep_p >> name_p
>> *sep_p >> ch_p(';') >> *sep_p;
rule<> typedef_list_p = *typedef_p;
I would alter the grammar slightly
ShortName <- ('_'|letter)('_'|letter|digit)*
Name <- ShortName
Name <- Name::ShortName
Type <- Name
Type <- Name Templates
Templates <- '<' Type (',' Type)* '>'
Instruction <- "typedef" Type Name ';'
Also your grammar leaves out the following cases
Multiple typedef targets.
Pointer targets
Function pointers (this is by far the most difficult)
Parsing a grammar (i love the irony) is a fairly straight forward operation. If you wanted to actually use the grammar in a functional way, I would say the best bet is a lex/yacc combination.
But from your question it appears that you want to spit it out to another format. There really isn't a language designed for this so I would say use whatever language you're most comfortable with.
Edit
The OP asked about multiple typedef targets. It's perfectly legally for a typedef declaration to have more than 1 target. For Example:
typedef _SomeStruct SomeStruct, *PSomeStruct
This creates 2 typedef names.
SomeStruct which is equivalent to "struct _SomeStruct"
PSomeStruct which is equivalent to "struct _SomeStruct*"
Well, since you're apparently already working with/on C++, have you considered using Boost.Spirit? This allows you to hard-code the grammar inline in C++ as a domain-specific language and program against it in normal C++ code.